首页 | 本学科首页   官方微博 | 高级检索  
     检索      

AToT 模型可视化工具开发
引用本文:孙国超,徐硕,乔晓东.AToT 模型可视化工具开发[J].情报工程,2016,2(4):020-029.
作者姓名:孙国超  徐硕  乔晓东
作者单位:中国科学技术信息研究所 北京 100038
基金项目:国家自然科学基金项目:基于论文和专利资源的技术机会发现研究(71403255),“十二五”国家科技支撑计划项目: 面向科技情报分析的信息服务资源开发与支撑技术研究(2015BAH25F01)和中国工程科技知识中心建设项目“知识组织体系建设” (CKCEST-2016-2-10)
摘    要:随着科研人员需要处理的文献集规模的日益庞大,以LDA 为代表的主题模型能够从语义层面挖掘大规模文献集中隐含的主题,因此,LDA 主题模型的应用越来越广泛。LDA 模型仅仅关注文献集的内容,而忽略了文献其他重要的外部信息,AToT 模型在LDA 主题模型的基础上引入了文献作者和文献发表时间两个属性,使AToT 模型不仅可以挖掘文献中隐含的信息,还可以分析文献作者的研究兴趣及文献主题随时间的变化。AToT 模型对文献集建模的结果是以概率矩阵的形式呈现,不能直观、全面、清晰的呈现挖掘出来的信息,特别是对数据挖掘不熟悉的科研人员,因此,本文开发了一个基于AToT 模型的可视化系统,该可视化系统清晰、美观地展现了AToT 模型中文献、主题、作者、时间、词项间的关系。如文档中的主题分布、主题的词项分布、作者的研究兴趣分布、主题的相似主题和主题的演化趋势等。

关 键 词:LDA  模型  AToT  模型  可视化  Django

Development of Visualization Tool for AToT Model
Authors:SUN GuoChao  XU Shuo and QIAO XiaoDong
Institution:Institute of Scientific and Technical Information of China,Institute of Scientific and Technical Information of China and Institute of Scientific and Technical Information of China
Abstract:Since LDA (Latent Dirichlet Allocation) topic model could mine underlying topics fromthe collectionof large-scale documents in the semantic viewpoint, it has been applied successfully in various ifelds. However, LDA model was only focus on the contents of documents while ignoring other important external information, such as authorship, timestamp, et ac. In order to overcome this problem, AToT (Author Topic over Time) model was proposed by combined analysis of the authorships and the publication time of documents, which can improve the AToT model for mining the implicit information of the documents, and analyzing the research interest of the authors and the variation of the documents. However, it was dififcult to understand the results of these models, especially for researchers unfamiliar with data mining. Therefore, this study developed a visualization tool for AToT model. The visualization system showed the relationship between topic,term,document,time and author clearly, for instance,the distribution of topic in document,the probability of words in topic,author’s interest,the similar topics and the trend of topic over time.
Keywords:LDA model  AToT model  visualization  Django
本文献已被 万方数据 等数据库收录!
点击此处可从《情报工程》浏览原始摘要信息
点击此处可从《情报工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号