首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于突发词对主题模型改进算法的微博热点话题发现研究
引用本文:向卓元,吴玉,陈浩,张芙玮.基于突发词对主题模型改进算法的微博热点话题发现研究[J].情报杂志,2022(1).
作者姓名:向卓元  吴玉  陈浩  张芙玮
作者单位:中南财经政法大学信息与安全工程学院
基金项目:国家自然科学基金面上项目“面向跨语言观点摘要的领域知识表示与融合模型研究”(编号:71974202)研究成果之一。
摘    要:研究目的]针对主流话题发现模型存在数据稀疏、维度高等问题,提出了一种基于突发词对主题模型(BBTM)改进的微博热点话题发现方法(BiLSTM-HBBTM),以期在微博热点话题挖掘中获得更好的效果。研究方法]首先,通过引入微博传播值、词项H指数和词对突发概率,从文档层面和词语层面进行特征选择,解决数据稀疏和高维度的问题。其次,通过双向长短期记忆(BiLSTM)训练词语之间的关系,结合词语的逆文档频率作为词对的先验知识,考虑了词之间的关系,解决忽略词之间关系的问题。再次,利用基于密度的方法自适应选择BBTM的最优话题数目,解决了传统的主题模型需要人工指定话题数目的问题。最后,利用真实微博数据集在热点话题发现准确度、话题质量、一致性三个方面进行验证。研究结论]实验表明,BiLSTM-HBBTM在多种评价指标上都优于对比模型,实验结果验证了所提模型的有效性及可行性。

关 键 词:热点话题发现  主题模型  微博  短文本  BiLSTM  BBTM  Word2Vec

Research on Microblog Hot Topic Discovery Based on the Improved BBTM Algorithm
Xiang Zhuoyuan,Wu Yu,Chen Hao,Zhang Fuwei.Research on Microblog Hot Topic Discovery Based on the Improved BBTM Algorithm[J].Journal of Information,2022(1).
Authors:Xiang Zhuoyuan  Wu Yu  Chen Hao  Zhang Fuwei
Institution:(School of Information and Safty Engineering, Zhongnan University of Economics and Law,Wuhan 430073)
Abstract:Research purpose]Aiming at the problems of sparse data and high dimension in mainstream topic discovery model,this paper proposes an improved microblog hot topic discovery method(BiLSTM-HBBTM)based on the bursty biterm topic model(BBTM),in order to get better performances in microblog hot topic mining.Research method]First,microblog propagation value,H index of term and bursty probability of biterm are used to select characteristics.The characteristics selection is carried out from the document level and the word level to solve the problem of data sparsity and high dimension.Second,through the Bi-directional long-short term memory(BiLSTM)training,the relationship between words,combined with the inverse document frequency of words as the prior knowledge of biterms,the relationship between words is considered and solve the problem of ignoring the relationship between words.Third,a density based method is used to select optimal number of topics for the BBTM model,which solves the problem that the traditional topic model needs to manually specify the number of topics.Finally,the actual datasets are used to verify the accuracy of hot topic discovery,topic quality and consistency.Research conclusion]The experiment shows that BiLSTM-HBBTM is better than the contrast model in a variety of evaluation indicators,and the experimental results have verified the effectiveness and feasibility of the model.
Keywords:hot topic discovery  topic model  microblog  short texts  BiLSTM  BBTM  Word2Vec
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号