首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于潜在语义索引和遗传算法的文本特征提取方法
引用本文:郝占刚,王正欧.基于潜在语义索引和遗传算法的文本特征提取方法[J].情报科学,2006,24(1):104-107.
作者姓名:郝占刚  王正欧
作者单位:天津大学,系统工程研究所,天津,300072
摘    要:本文采用潜在语义索引(LSI)和遗传算法(GA)进行文本特征提取。在采用潜在语义索引将语义关系体现在VSM(Vector Space Model)中,通过奇异值分解(SVD,Singular Value Deccvaposition)可以有效地降低向量空间的维数,但通过维数约简后的文本特征仍要保持在数百维左右,因此本文采用遗传算法在此基础上继续降维。实验结果表明,这两种方法结合可以极大的降低文本向量空间的雏数,并能提高分类准确率。

关 键 词:特征提取  潜在语义索引  遗传算法  Kohonen网络
文章编号:1007-7634(2006)01-0104-04
收稿时间:2005-02-23
修稿时间:2005年2月23日

The Method of Text Feature Selection Based on LSI and GA
HAO Zhan-gang,WANG Zheng-ou.The Method of Text Feature Selection Based on LSI and GA[J].Information Science,2006,24(1):104-107.
Authors:HAO Zhan-gang  WANG Zheng-ou
Institution:Institute of Systems Engineering,Tianjin University,Tianjin 300072,China
Abstract:This paper selects the features of text by using LSI and GA(Genetic Algorithm).This paper uses LSI to reflect the relation of words in VSM(Vector Space Model).The dimension of VSM can be reduced greatly by Singular Value Decomposition.However,after that the text features have still several hundreds dimensions,so this paper continues to reduce the dimension by using GA in this base.The results of exexperiment indicate that combining these two methods can greatly reduce the dimension of VSM and advance precision of text classifying.
Keywords:feature selection  latent semantic index  genetic algorithm  kohonen network
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号