首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
梳理现有的网络舆情监控算法,归纳为经典算法和拓展算法,分析算法的优缺点并对比常见算法的性能,整理部分专家学者对这些监控算法的优化改进研究情况并分析其研究成果。从技术、管理和应用等3个角度评价现有网络舆情监控算法的优化改进环节和取得的成效,并指出其存在的问题主要集中在只基于文本、针对单一数据类型、没有考虑事件和用户的差异化特点和动态变化情况、缺乏综合监控体系思维和管理机制等方面。进而从网络舆情的特点、发展规律、驱动因素、现有监控算法不足和监控效果期望等角度探讨未来网络舆情监控算法的发展趋势。  相似文献   

2.
庄媛 《情报科学》2023,41(2):150-156
【目的/意义】企业和国家对网络热点话题舆情的关注度越来越高,越来越多的企业、部门和政府通过舆情信息监控系统应对网络中爆发的群体性事件和舆论压力,在此环境下对网络热点话题舆情信息进行监控可以方便舆情危机的处理。而传统的网络舆情信息监控方法为构建词项识别体系,存在监控效率低、监控效果差等问题。为此,本文对网络热点话题舆情信息监控策略进行研究。【方法/过程】构建的ISM模型对网络热点话题舆情信息监控进行相关性分析,利用建立词项识别体系,并通过K-means算法处理突发词项完成对网络热点话题舆情信息的识别,获取网络热点话题舆情信息监控影响因素集,构建舆情信息监控影响因素的直接关系矩阵,结合布尔代数运算规则和推移规律建立可达矩,从而构建出舆情信息监控影响因素关联矩阵及解释结构模型,完成网络热点话题舆情信息监控。【结果/结论】结果表明,舆情热度、舆情关注度、舆情影响力、舆情敏感度和网民情感都会对网络热点话题舆情信息的监控产生影响。在此基础上提出网络热点话题舆情信息监控策略。【创新/局限】为有效地防止网络恶性事件突发,需要全面、及时地掌握网络热点话题舆情信息的发展情况,通过分析网络热点话题舆情信息监...  相似文献   

3.
近年来,高校学生突发事件频发,这类事件大都和网络舆情有着密切的关系,为了防范不良网络舆情的扩散和演化,有必要建立快速的网络舆情监控预警机制。通过对高校突发事件不同阶段网络舆情特点的分析,提取其共性特征并作为网络舆情监控预警指标体系,从而构建出一套适合高校舆情信息监控预警机制,实现高校舆情信息工作监控预警机制建设的目的。  相似文献   

4.
深入研究政府媒体人力资源管理的问题和原因,结合网络舆情应对的特点对人员进行重新分类,提出切实可行的政府媒体人力资源优化策略,对于政府媒体的管理效率、网络舆情的应对效率都有重要意义。从网络舆情应对的角度,对政府媒体人力资源进行深度分析,基于政府媒体人力资源优化管理位得其人、人尽其才、事求其功的3个目标和能级匹配、无缝衔接、动态调整、优势互补的4个原则,提出了面向网络舆情应对的政府媒体人力资源优化管理的策略。根据其优化策略,实现政府媒体人力资源能够用最小的规模发挥最大的功能,在网络舆情应对的时候,可以更高效地解决。  相似文献   

5.
分析了突发事件网络舆情政府监控取得的成效,并指出了存在地方政府不够重视、反应滞后、处理方式不当等问题,最后对产生突发事件网络舆情监控困境的情况复杂、专门法律建设滞后、机制不完善等原因进行了分析。  相似文献   

6.
网络舆情监测及预警指标体系研究综述   总被引:3,自引:0,他引:3  
网络舆情监测与预警日益受到政府部门和企事业单位的重视。现有的网络舆情监测与预警指标体系分别从传播学、公共管理以及信息科学的角度对网络舆情的传播特性、主题特征、内容价值等方面进行了揭示,提炼出了网络舆情的大部分监测点。不过深入的分析也发现部分指标体系存在缺乏深度、难以评估,指标体系不完整,缺乏受众倾向等关键要素等问题。  相似文献   

7.
高承实  荣星  陈越  邬江兴 《情报杂志》2012,31(5):36-39,79
回顾了目前几种典型的网络舆情预警指标体系,在深入研究分析网络舆情事件特点和规律的基础上,将难以观测的基于事件的网络舆情预警监控转换为易于观测的基于不同主题分类上的网络舆情监控。定义了网络舆情影响指数、网络舆情趋势指数和网络舆情积聚指数3个指数,分别给出了各指数的具体含义和计算办法,最后对各种指数的综合运用进行了讨论。  相似文献   

8.
本文介绍了大数据环境下网络舆情的形成机理,特别是涉警网络舆情当前所面临的挑战以及大数据技术应用在舆情监控的前景及问题进行了分析,同时文中提出了一些改进网络舆情监控的措施,特别将大数据思维及方法运用到网络舆情分析中,为舆情监控未来发展提出可借鉴的思路。  相似文献   

9.
王世文  杨晨雁  刘劲  邵琦 《情报杂志》2023,(11):105-112
[研究目的]重大突发事件引发网络舆情信息的传播,在一定程度上会影响政府公信力的提升和社会的稳定。构建重大突发事件网络舆情元数据框架,对重大突发事件网络舆情内容及结构进行规范,为政府进行重大突发事件应急决策和网络舆情预警提供支持。[研究方法]通过网站浏览、案例分析、文献调研等实证研究,综合运用归纳和演绎方法,对重大突发事件引发的各类网络舆情信息源及其构成要素进行分析,在此基础上选择核心元数据元素,构建重大突发事件网络舆情元数据框架。[研究结论]通过梳理元数据框架,根据各类网络舆情信息源的概念和特点,从舆情客体、舆情主体、舆情本体、舆情载体为核心描述角度构建重大突发事件网络舆情元数据框架,设置与重大突发事件关联的专有元素,并对元数据框架进行描述效果应用,为重大突发事件网络舆情数据的存储、利用提供了支持。  相似文献   

10.
网络舆情分析领域已经吸引了研究人员的关注并产生了诸多成果。本文对于近年来网络舆情分析领域的相关研究进行梳理,同时对现有网络舆情分析系统的架构进行剖析,对其中的数据采集、数据预处理、数据分析以及舆情展示等部分的支撑技术进行分析,并对网络舆情分析系统的发展方向进行探讨。  相似文献   

11.
Preprocessing is one of the key components in a typical text classification framework. This paper aims to extensively examine the impact of preprocessing on text classification in terms of various aspects such as classification accuracy, text domain, text language, and dimension reduction. For this purpose, all possible combinations of widely used preprocessing tasks are comparatively evaluated on two different domains, namely e-mail and news, and in two different languages, namely Turkish and English. In this way, contribution of the preprocessing tasks to classification success at various feature dimensions, possible interactions among these tasks, and also dependency of these tasks to the respective languages and domains are comprehensively assessed. Experimental analysis on benchmark datasets reveals that choosing appropriate combinations of preprocessing tasks, rather than enabling or disabling them all, may provide significant improvement on classification accuracy depending on the domain and language studied on.  相似文献   

12.
Text categorization pertains to the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the subsequent assignment of unclassified documents to appropriate categories. Most existing text categorization techniques deal with monolingual documents (i.e., written in the same language) during the learning of the text categorization model and category assignment (or prediction) for unclassified documents. However, with the globalization of business environments and advances in Internet technology, an organization or individual may generate and organize into categories documents in one language and subsequently archive documents in different languages into existing categories, which necessitate cross-lingual text categorization (CLTC). Specifically, cross-lingual text categorization deals with learning a text categorization model from a set of training documents written in one language (e.g., L1) and then classifying new documents in a different language (e.g., L2). Motivated by the significance of this demand, this study aims to design a CLTC technique with two different category assignment methods, namely, individual- and cluster-based. Using monolingual text categorization as a performance reference, our empirical evaluation results demonstrate the cross-lingual capability of the proposed CLTC technique. Moreover, the classification accuracy achieved by the cluster-based category assignment method is statistically significantly higher than that attained by the individual-based method.  相似文献   

13.
Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.  相似文献   

14.
In recent years, new semistatic word-based byte-oriented text compressors, such as Tagged Huffman and those based on Dense Codes, have shown that it is possible to perform fast direct search over compressed text and decompression of arbitrary text passages over collections reduced to around 30–35% of their original size. Much of their success is due to the use of words as source symbols and a byte-oriented target alphabet. This approach broke with traditional statistical compressors, which use characters as source symbols and a bit-oriented target alphabet.  相似文献   

15.
The pre-trained language models (PLMs), such as BERT, have been successfully employed in two-phases ranking pipeline for information retrieval (IR). Meanwhile, recent studies have reported that BERT model is vulnerable to imperceptible textual perturbations on quite a few natural language processing (NLP) tasks. As for IR tasks, current established BERT re-ranker is mainly trained on large-scale and relatively clean dataset, such as MS MARCO, but actually noisy text is more common in real-world scenarios, such as web search. In addition, the impact of within-document textual noises (perturbations) on retrieval effectiveness remains to be investigated, especially on the ranking quality of BERT re-ranker, considering its contextualized nature. To mitigate this gap, we carry out exploratory experiments on the MS MARCO dataset in this work to examine whether BERT re-ranker can still perform well when ranking text with noise. Unfortunately, we observe non-negligible effectiveness degradation of BERT re-ranker over a total of ten different types of synthetic within-document textual noise. Furthermore, to address the effectiveness losses over textual noise, we propose a novel noise-tolerant model, De-Ranker, which is learned by minimizing the distance between the noisy text and its original clean version. Our evaluation on the MS MARCO and TREC 2019–2020 DL datasets demonstrates that De-Ranker can deal with synthetic textual noise more effectively, with 3%–4% performance improvement over vanilla BERT re-ranker. Meanwhile, extensive zero-shot transfer experiments on a total of 18 widely-used IR datasets show that De-Ranker can not only tackle natural noise in real-world text, but also achieve 1.32% improvement on average in terms of cross-domain generalization ability on the BEIR benchmark.  相似文献   

16.
通过对全文检索原理及相关技术的说明,分析了Oracle Text技术体系,最后通过范例说明如何使用Oracle Text管理器和SQL* PLUS两种方式建立全文检索.  相似文献   

17.
Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.  相似文献   

18.
在支持向量机和遗传算法的基础上,提出一种新的启发式多层文本分类算法。实验结果证明了该算法的可行性和有效性。文本分类技术是解决大规模文本处理的有效途径。  相似文献   

19.
Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.  相似文献   

20.
范宇中  张玉峰 《情报科学》2003,21(1):103-105
本文结合运用信息管理和人工智能的原理与技术,探讨了文本知识的自动分类方法,包括:自动归类与聚类方法、基于实例的学习分类方法和基于特征值的元学习方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号