首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于混合深度信念网络的多类文本表示与分类方法
引用本文:翟文洁,闫琰,张博文,殷绪成.基于混合深度信念网络的多类文本表示与分类方法[J].情报工程,2016,2(5):030-040.
作者姓名:翟文洁  闫琰  张博文  殷绪成
作者单位:1. 北京科技大学计算机科学与技术系 北京 100083;2. 中国矿业大学计算机科学与技术系 北京 100083
基金项目:本文受国家自然科学基金项目:结合前馈和反馈机制的自然场景文本识别技术(61473036)资助。
摘    要:本文开展了基于混合深度信念网络的多类文本表示与分类方法的研究,以解决传统的Bag-of-Words(BOW)表示方法忽略文本语义信息、特征提取存在高维度高稀疏的问题。文章基于文本关键字,针对多类的分类任务(如新闻文本和生物医学文本),以关键字的词向量表示作为文本输入,同时结合深度信念网络(Deep Belief Network,DBN)和深度玻尔兹曼机网络(Deep Boltzmann Machine, DBM),设计了一种混合深度信念网络(Hybrid Deep Belief Network,HDBN)模型。文本分类和文本检索的实验结果表明,基于词向量嵌入的深度学习模型在性能上优于传统方法。此外,通过二维空间可视化实验,由HDBN模型提取的高层文本表示具有高内聚低耦合的特点。

关 键 词:文本分类  文本表示  深度学习  深度信念网络

A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks
Authors:ZHAI WenJie  YAN Yan  ZHANG BoWen and YIN XuCheng
Institution:Department of Computer Science and Technology, University of Science and Technology Beijing,Department of Computer Science and Technology, China University of Mining and Technology,Department of Computer Science and Technology, University of Science and Technology Beijing and Department of Computer Science and Technology, University of Science and Technology Beijing
Abstract:This paper developed a model for text representation and classiifcation based on hybrid deep belief networks, in order to solve the problem of traditional text representation methcod (Bag-of-Words), which ignores the semantic relations and whose feature extraction is high-dimensional and high-sparse. Based on the text keywords, we explored the word vector of keywords as the input for multiple classiifcation tasks, such as news and biomedicine texts, and we also proposed a new model—HDBN (Hybrid Deep Belief Network) which is based on the integration of DBN (Deep Belief Network) and DBM (Deep Boltzmann Machine). The results of text categorization and text retrieval showed that the HDBN model can performed better than the traditional methods. Moreover, the results of two-dimensional spatial visualization also indicated that high-level text representation based on the HDBN model presented the character of high cohesion and low coupling.
Keywords:Text classification  text representation  deep learning  deep belief networks
本文献已被 万方数据 等数据库收录!
点击此处可从《情报工程》浏览原始摘要信息
点击此处可从《情报工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号