首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义概念和词共现的微博主题词提取研究
引用本文:张孝飞,陈航行,张春花.基于语义概念和词共现的微博主题词提取研究[J].情报科学,2021,39(1):142-147.
作者姓名:张孝飞  陈航行  张春花
作者单位:西藏民族大学图书馆;西藏民族大学新闻与传播学院
基金项目:国家社科基金西部项目“自媒体环境下藏区网络舆情转变及其治理方略研究”(18XXW010);教育部人文社会科学研究规划基金西藏项目“智慧校园环境下西藏高校图书馆用户画像及其应用研究”(19XZJA870001);西藏自治区高等学校人文社会科学研究项目“基于藏文网络媒体舆情分析的热点话题发现方法研究”(SK2017-13)。
摘    要:【目的/意义】从海量微博信息中提取准确的主题词,以期为政府和企业进行舆情分析提供有价值的参考。 【方法/过程】通过分析传统微博主题词提取方法的特点及不足,提出了基于语义概念和词共现的微博主题词提取 方法,该方法利用文本扩充策略将微博从短文本扩充为较长文本,借助于语义词典对微博文本中的词汇进行语义 概念扩展,结合微博文本结构特点分配词汇权重,再综合考虑词汇的共现度来提取微博主题词。【结果/结论】实验 结果表明本文提出的微博主题词提取算法优于传统方法,它能够有效提高微博主题词提取的性能。【创新/局限】利 用语义概念结合词共现思想进行微博主题词提取是一种新的探索,由于算法中的分词方法对个别网络新词切分可 能不合适,会对关键词提取准确性造成微小影响。

关 键 词:微博  主题词  语义概念  词共现  特征词

Microblog Subject Words Extract Based on Semantic Concept and Word Co-occurrence
ZHANG Xiao-fei,CHEN Hang-xing,ZHANG Chun-hua.Microblog Subject Words Extract Based on Semantic Concept and Word Co-occurrence[J].Information Science,2021,39(1):142-147.
Authors:ZHANG Xiao-fei  CHEN Hang-xing  ZHANG Chun-hua
Institution:(Library of Xizang Minzu University,Xianyang 712082,China;School of Journalism&Communication,Xizang Minzu University,Xianyang 712082,China)
Abstract:【Purpose/significance】Extracting accurate keywords from massive microblog information,in order to provide valuable reference for government and enterprises to analyze public opinion.【Method/process】Through the analysis of the characteristics of traditional microblog keywords extraction method and the insufficiency,proposed microblog keywords extraction method based on the semantic concept and word co-occurrence,the method uses text expansion strategy to expand microblog from short text to long text,by means of semantic dictionary to do semantic concept extenseion for microblog words,combining with the characteristics of microblog to distribute structure weight of vocabulary,and considering the degree of co-occurrence words to extract microblog keywords.【Result/conclusion】The experimental results show that the microblog subject word extraction algorithm proposed in this paper is superior to traditional methods.It can effectively improve the performance of microblog subject word extraction.【Innovation/limitation】It is a new exploration to use semantic concepts combined with the idea of word co-occurrence to extract microblog subject words.Since the word segmentation method in the algorithm may not be appropriate for the segmentation of some new network words,there is a slight impact on the accuracy of keyword extraction.
Keywords:microblog  subject words  semantic concepts  word co-occurrence  feature word
本文献已被 维普 等数据库收录!
点击此处可从《情报科学》浏览原始摘要信息
点击此处可从《情报科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号