首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一个K-均值文档聚类的改进算法
引用本文:吴景岚,刘燕,朱文兴.一个K-均值文档聚类的改进算法[J].闽江学院学报,2004,25(2):48-52.
作者姓名:吴景岚  刘燕  朱文兴
作者单位:1. 闽江学院计算机科学系,福建,福州,350108
2. 福州大学计算机科学与技术系,福建,福州,350002
基金项目:福建省自然科学基金[A0310013]
摘    要:k均值算法是一个常用的局部搜索算法,它的主要缺陷是容易陷入局部极小,并且该局部极小解与全局最优解往往有很大的偏差。本文提出一个基于K-均值的迭代局部搜索文档聚类算法。该算法以k均值算法所得到的解作为初始解,从该初始解开始作局部搜索。在搜索过程中接受部分劣解。当解无法改进时,算法对所得到的局部极小解做适当强度的扰动后进行下一次的迭代,以跳出局部极小,从而拓展了搜索的范围。实验结果表明该算法对文档数据集聚类的正确性迭99%以上。

关 键 词:K-均值  迭代局部搜索文档聚类算法  局部极小解  全局最优解  数据库
文章编号:1009-7821(2004)02-048-05
修稿时间:2004年2月1日

An Advanced Algorithm for K-Means Document Clustering
WU Jing-Lan LIU Yan ZHU Wen-xing.An Advanced Algorithm for K-Means Document Clustering[J].Journal of Minjiang University,2004,25(2):48-52.
Authors:WU Jing-Lan LIU Yan ZHU Wen-xing
Abstract:K-means clustering algorithm is one of the common local search approaches used in clustering problem. But the main drawback of K-means is that it often gets trapped in local optima that are significantly worse than the global optimum. This paper presents an Iterated Local Search document clustering algorithm based on K-means, it takes the solution by K-means algorithm as its initial solution, from which local search process is started; during the searching, some bad solutions are accepted. When a solution can no more be improved, the algorithm makes the next iteration after an appropriate disturbance on the local minimum solution, in order to skip out of the local minimum, consequently enlarging the search space. Results indicate that the proposed algorithms gain 99% plus correctness for document clustering.
Keywords:K-means Algorithm  Document Clustering  Iterated Local Search
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号