一种改进的k-means聚类算法 Improved k-means clustering algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种改进的k-means聚类算法

引用本文：	夏士雄,李文超,周勇,张磊,牛强.一种改进的k-means聚类算法[J].东南大学学报,2007,23(3):435-438.

作者姓名：	夏士雄李文超周勇张磊牛强

作者单位：	中国矿业大学计算机科学与技术学院徐州221008

基金项目：	The National Natural Science Foundation of China(No50674086)，Specialized Research Fund for the Doctoral Program of Higher Education (No20060290508)，the Youth Scientific Research Foundation of China University of Mining and Technology (No2006A047)

摘要：	针对k-means算法事先必须获知聚类数目以及难以确定初始中心的缺点,提出了一种改进的k-means聚类算法.首先引入轮廓系数的概念,通过计算不同K值下簇集中各对象的轮廓系数确定事先未知分类信息的数据集中所包含的最优聚类数Kopt;然后通过凝聚层次聚类的方法获得数据集的分布,确定初始聚类中心;最后利用传统的k-means方法完成聚类.理论分析表明,所提出的算法具有适度的计算复杂度.IRIS测试数据集的实验结果表明了该算法能够合理区分不同类型的簇集,且可以有效地识别离群点,聚合后的结果簇集具有较低的熵值.
关键词：	聚类 k-means算法轮廓系数
修稿时间：	2007-05-18
Improved k-means clustering algorithm

Xia Shixiong,Li Wenchao,Zhou Yong,Zhang Lei,Niu Qiang.Improved k-means clustering algorithm[J].Journal of Southeast University(English Edition),2007,23(3):435-438.

Authors:	Xia Shixiong Li Wenchao Zhou Yong Zhang Lei Niu Qiang

Institution:	School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221008, China

Abstract:	In allusion to the disadvantage of having to obtain the number of clusters of data sets in advance and the sensitivity to selecting initial clustering centers in the k-means algorithm,an improved k-means clustering algorithm is proposed.First,the concept of a silhouette coefficient is introduced,and the optimal clustering number Kopt of a data set with unknown class information is confirmed by calculating the silhouette coefficient of objects in clusters under different K values.Then the distribution of the data set is obtained through hierarchical clustering and the initial clustering-centers are confirmed.Finally,the clustering is completed by the traditional k-means clustering.By the theoretical analysis,it is proved that the improved k-means clustering algorithm has proper computational complexity.The experimental results of IRIS testing data set show that the algorithm can distinguish different clusters reasonably and recognize the outliers efficiently,and the entropy generated by the algorithm is lower.

Keywords:	clustering k-means algorithm silhouette coefficient
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏