基于潜在语义索引和遗传算法的文本特征提取方法 The Method of Text Feature Selection Based on LSI and GA期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于潜在语义索引和遗传算法的文本特征提取方法

引用本文：	郝占刚,王正欧.基于潜在语义索引和遗传算法的文本特征提取方法[J].情报科学,2006,24(1):104-107.

作者姓名：	郝占刚王正欧

作者单位：	天津大学,系统工程研究所,天津,300072

摘要：	本文采用潜在语义索引（LSI）和遗传算法（GA）进行文本特征提取。在采用潜在语义索引将语义关系体现在VSM（Vector Space Model）中，通过奇异值分解（SVD，Singular Value Deccvaposition）可以有效地降低向量空间的维数，但通过维数约简后的文本特征仍要保持在数百维左右，因此本文采用遗传算法在此基础上继续降维。实验结果表明，这两种方法结合可以极大的降低文本向量空间的雏数，并能提高分类准确率。
关键词：	特征提取潜在语义索引遗传算法 Kohonen网络
文章编号：	1007-7634（2006）01-0104-04
收稿时间：	2005-02-23
修稿时间：	2005年2月23日
The Method of Text Feature Selection Based on LSI and GA

HAO Zhan-gang,WANG Zheng-ou.The Method of Text Feature Selection Based on LSI and GA[J].Information Science,2006,24(1):104-107.

Authors:	HAO Zhan-gang WANG Zheng-ou

Institution:	Institute of Systems Engineering,Tianjin University,Tianjin 300072,China

Abstract:	This paper selects the features of text by using LSI and GA(Genetic Algorithm).This paper uses LSI to reflect the relation of words in VSM(Vector Space Model).The dimension of VSM can be reduced greatly by Singular Value Decomposition.However,after that the text features have still several hundreds dimensions,so this paper continues to reduce the dimension by using GA in this base.The results of exexperiment indicate that combining these two methods can greatly reduce the dimension of VSM and advance precision of text classifying.

Keywords:	feature selection latent semantic index genetic algorithm kohonen network
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏