首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义扩展的数字文献自动分类方法研究
引用本文:巴志超,朱世伟,于俊凤,魏墨济.基于语义扩展的数字文献自动分类方法研究[J].现代情报,2015,35(9):70-74.
作者姓名:巴志超  朱世伟  于俊凤  魏墨济
作者单位:山东省科学院情报研究所, 山东 济南 250014
摘    要:针对图书、期刊论文等数字文献文本特征较少而导致特征向量语义表达不够准确、分类效果差的问题,本文提出一种基于特征语义扩展的数字文献分类方法。该方法首先利用TF-IDF方法获取对数字文献文本表示能力较强、具有较高TF-IDF值的核心特征词;其次分别借助知网(Hownet)语义词典以及开放知识库维基百科(Wikipedia)对核心特征词集进行语义概念的扩展,以构建维度较低、语义丰富的概念向量空间;最后采用MaxEnt、SVM等多种算法构造分类器实现对数字文献的自动分类。实验结果表明:相比传统基于特征选择的短文本分类方法,该方法能有效地实现对短文本特征的语义扩展,提高数字文献分类的分类性能。


Research on Automatic Classification of Digital Document Based on Semantic Extension
Authors:Ba Zhichao  Zhu Shiwei  Yu Junfeng  Wei Moji
Institution:Information Research Institute of Shandong Academy of Sciences, Jinan 250014, China
Abstract:Aiming at the problems of inaccurate concept expression of text vector and poor classification effect which is caused by sparse feature keywords in digital documents of books and journal articles etc, the paper proposed a classification method based on the features of semantic extension.Firstly, this method adopted TF-IDF method to filter keywords that have higher ability of digital text representation and TF-IDF value than other common features.Secondly, to build the low dimensionality and semantic conceptual vector space, it extended semantic concept of core features collections based on the Hownet semantic dictionary and knowledge base of Wikipedia.Finally, it realized digital document automatic classification by applying MaxEnt and SVM algorithms.The result showed that the proposed method can more effectively expend short text on semantics and improve the classification performance of digital document compared with traditional short text classification method based on characteristic selection.
Keywords:
点击此处可从《现代情报》浏览原始摘要信息
点击此处可从《现代情报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号