首页 | 本学科首页   官方微博 | 高级检索  
     检索      

最大词重降维算法与模拟退火算法相结合的文本聚类方法研究
引用本文:陆国丽,王小华,王荣波.最大词重降维算法与模拟退火算法相结合的文本聚类方法研究[J].现代图书情报技术,2008,24(12):43-47.
作者姓名:陆国丽  王小华  王荣波
作者单位:杭州电子科技大学计算机应用技术研究所,杭州,310018
摘    要:提出一种基于最大词重的文本特征提取与降维算法。其基本思想是利用词在文档库的重要性,通过搜索算法将最大重要性的词从高维文档库中提取出来构成低维文档库,达到特征提取与降维的目的。在此基础上,提出利用模拟退火算法改进的K-means聚类算法对降维得到的文本进行聚类分析,实验结果表明该方法可以有效地提高聚类精度。

关 键 词:文本聚类  最大词重  特征提取  模拟退火
收稿时间:2008-09-02
修稿时间:2008-09-24

Text Clustering Research on the Max Term Contribution Dimension Reduction and Simulated Annealing Algorithm
Lu Guoli,Wang Xiaohua,Wang Rongbo.Text Clustering Research on the Max Term Contribution Dimension Reduction and Simulated Annealing Algorithm[J].New Technology of Library and Information Service,2008,24(12):43-47.
Authors:Lu Guoli  Wang Xiaohua  Wang Rongbo
Institution:(Computer Application Technology Laboratory of Hangzhou Dianzi University, Hangzhou 310018, China)
Abstract:This paper presents a new algorithm for text character extraction and dimension reduction based on the Max Term Contribution. Its main idea is computing the contribution of each term in the high dimension document-base and extracting the maximum contribution terms to construct a low dimension document-base from the high dimension document-base using the search algorithm. Then a modified K-means clustering method based on the Simulated Annealing (SA) is presented to cluster the low dimension document datum which is obtained by MTC. Finally, some experiments show that the new method can improve the cluster precision.
Keywords:Text clustering  Max term contribution  Character extraction  Simulated annealing
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《现代图书情报技术》浏览原始摘要信息
点击此处可从《现代图书情报技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号