首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise(DBSCAN)(Ester et al.,1996),and has the following advantages: first,Greedy algorithm substitutes for R*-tree(Bechmann et al.,1990)in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second,the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally,authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency.  相似文献   

2.
The density-based clustering algorithm presented is different from the classical Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Esteret al., 1996), and has the following advantages: first, Greedy algorithm substitutes forR *-tree (Bechmannet al., 1990) in DBSCAN to index the clustering space so that the clustering time cost is decreased to great extent and I/O memory load is reduced as well; second, the merging condition to approach to arbitrary-shaped clusters is designed carefully so that a single threshold can distinguish correctly all clusters in a large spatial dataset though some density-skewed clusters live in it. Finally, authors investigate a robotic navigation and test two artificial datasets by the proposed algorithm to verify its effectiveness and efficiency. Project (No. 2002AA2010) supported by the Hi-Tech Research and Development Program (863) of China  相似文献   

3.
Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example,DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerst et al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.  相似文献   

4.
A statistical information-based clustering approach in distance space   总被引:2,自引:0,他引:2  
Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the nderlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other usiness applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996) is a good erformance clustering method for dealing with spatial data although it leaves many problems to be solved. For example, BSCA…  相似文献   

5.
聚类算法是数据挖掘的核心技术,基于密度的聚类是一类已经被证明非常有效的聚类方法.与DBSCAN算法作比较,文章提出了一种基于密度的聚类算法(Clustering Using Centers and Density,CUCD).该算法是基于中心点以及密度实现的,其核心对象是根据数据分布计算出来的虚拟的点,并且核心对象的代表性随程序的执行次数而提高;经实验验证,该算法具有较好的时间效率和聚类质量.  相似文献   

6.
随着智慧农业的发展,农业生产中海量数据不断涌现。在海量数据中难免存在噪声数据,这些数据不仅难以提供有效价值,还会影响信息挖掘。针对该问题,采用基于密度的DBSCAN聚类算法进行异常数据处理。鉴于DBSCAN算法对参数敏感,结合数据集本身特性与统计学思想以绘制各点之间的距离升序曲线,预估出DBSCAN的Eps参数。仿真实验结果表明,改进算法平均准确率达到99.6%,较传统算法提高了1.7个百分点,并且在10次检测中,改进算法只有3个数据判定错误,证明该参数设置方法对异常数据处理准确率更高,稳定性也更好。  相似文献   

7.
This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise (DBSCAN) 's global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node's radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6.  相似文献   

8.
一种改进的k-means聚类算法   总被引:2,自引:0,他引:2  
针对k-means算法事先必须获知聚类数目以及难以确定初始中心的缺点,提出了一种改进的k-means聚类算法.首先引入轮廓系数的概念,通过计算不同K值下簇集中各对象的轮廓系数确定事先未知分类信息的数据集中所包含的最优聚类数Kopt;然后通过凝聚层次聚类的方法获得数据集的分布,确定初始聚类中心;最后利用传统的k-means方法完成聚类.理论分析表明,所提出的算法具有适度的计算复杂度.IRIS测试数据集的实验结果表明了该算法能够合理区分不同类型的簇集,且可以有效地识别离群点,聚合后的结果簇集具有较低的熵值.  相似文献   

9.
K 均值算法(K-Means)是聚类算法中最受欢迎且最健壮的一种算法,然而在实际应用中,存在真实数据集划分的类数无法提前确定及初始聚类中心点随机选择易使聚类结果陷入局部最优解的问题。因此提出一种基于最大距离中位数及误差平方和(SSE)的自适应改进算法。该算法根据计算获取初始聚类中心点,并通过 SSE 变化趋势决定终止聚类或继续簇的分裂,从而自动确定划分的类簇个数。采用 UCI 的 4 种数据集进行实验。结果表明,改进后的算法相比传统聚类算法在不增加迭代次数的情况下,聚类准确率分别提高了17.133%、22.416%、1.545%、0.238%,且聚类结果更加稳定。  相似文献   

10.
一种K-means算法的k值优化方案   总被引:1,自引:0,他引:1  
聚类算法是数据挖掘中核心技术之一,而k-means算法在经典聚类算法中占有重要地位。针对k-means聚类算法的最佳聚类个数k不易获得,因而使得该聚类算法的应用受到限制,为此提出一种k值优化方法:通过给出大于最佳聚类数的可能聚类数,而得到优化的聚类个数。通过实例给予验证,其结果说明该方法合理有效。  相似文献   

11.
基因组重排问题是分子生物学中的重要问题,进化问题的研究可归结为进化距离问题的研究.即计算从一个基因组进化为另一个基因组所需的最少的进化变换数目.可借助基因组之间的圈图研究翻转进化问题,Hannenhalli给出了一个计算圈图分支的一个线性时间算法,但考察的对象为圈图上的圈集合,且需要一些等价变换.从边集合出发给出了计算有向基因组的圈图连通分支的线性时间算法.  相似文献   

12.
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the K- means algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley's Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.  相似文献   

13.
基于向量空间模型的文档聚类算法研究   总被引:3,自引:0,他引:3  
随着网络信息的迅速增长,文档聚类技术成为了人们研究的热点课题.探讨了几种基于向量空间模型的文档聚类算法,如常见的k—means算法和凝聚层次算法,针对它们的不足提出了改进的BK-means算法和多层CFK-means算法.最后,根据一定的评价标准,得出Bk—means算法是文档聚类算法中较好的算法.  相似文献   

14.
传统欠采样方法在处理不平衡数据问题时只考虑多数类样本的绝对位置而忽略了其相对位置,从而使产生的平衡数据集存在边界模糊问题。提出一种改进 K 均值聚类的不平衡数据欠采样算法(UD-PK)。该算法首先利用改进的 PSO 算法迭代寻找全局最优解作为 K-means 聚类所需初始值,然后通过 K-means 进行聚类,再按照每个类别中多数类与少数类的比例定义所取多数类样本个数,并根据多数类样本与簇心距离择优选择参与平衡数据集构造。在 UCI 数据集上的对比试验表明,该算法在少数类准确率上较一些经典算法有很大提升。  相似文献   

15.
互联网时代,网络焦点话题讨论对当代高校学生的思想有很大影响,因此对高校舆情进行监测具有十分重要的意义。通过改进的K-means算法对高校舆情进行聚类,获取舆情热点。通过聚类算法获取热点话题,进而对热点舆情话题进行引导,对改进高校学生思想政治工作作用显著。对改进算法进行实验,结果表明该算法准确率达到75%,比传统算法高出8%,改善了传统算法的聚类效果。  相似文献   

16.
随着大数据时代的到来,聚类分析算法将面临如数据量巨大、数据维数增加等挑战,分布式处理是解决这类问题的方法之一.本研究将ROCK算法与Hadoop平台相结合,按照分布式处理原则,通过计算机集群模式去处理大规模的多样性数据.实验证明,在Hadoop平台下的ROCK聚类算法很大程度上提升了对高维数据进行聚类的能力.  相似文献   

17.
相比较于其它聚类算法,密度峰值聚类算法可将任意形状的数据与较少的参数和高效的聚类速度结合起来。针对当某个类中出现多个密度峰值时,聚类结果缺乏准确性的问题,提出一种改进的密度峰值聚类结果有效性造成的影响,算法通过比较类簇之间的密度属性,实现动态的子簇合并,减少主观因素对算法结果的影响。通过实验与已有密度聚类算法对比,改进算法不仅很好地避免了原算法人为确定参数给实验结果造成的影响,而且具有更好的聚类性能。  相似文献   

18.
We propose an adaptive fractional window increasing algorithm (AFW) to improve the performance of the fractional window increment (FEW) in (Nahm et al., 2005). AFW fully utilizes the bandwidth when the network is idle, and limits the operating window when the network is congested. We evaluate AFW and compare the total throughput of AFW with that of FeW in different scenarios over chain, grid, random topologies and with hybrid traffics. Extensive simulation through ns2 shows that AFW obtains 5% higher throughput than FeW, whose throughput is significantly higher than that of TCP-Newreno, with limited modi- fications.  相似文献   

19.
提出了一种在有限反馈条件下多输入多输出-正交频分复用(MIMO-OFDM)系统中基于信道容量的分簇波束成形算法。该算法针对MIMO-OFDM系统在高信噪、低信噪比等不同情况时选择不同的波束成形向量来提升系统的容量。根据簇间、各子簇间均存在一定的相关性,设计出了次优波束成形算法。通过对所提出的算法仿真可知,在高信噪比、低信噪比情况下针对系统容量采用分簇的波束成形算法能够在一定程度上提升MIMO-OFDM系统在确定性信道与随机性信道的信道容量。  相似文献   

20.
李晓飞  李好 《唐山学院学报》2010,(3):44-44,45,46
连续属性离散化问题是机器学习的重要方面,是数据预处理问题之一。文章提供的基于粗糙集的层次聚类算法(RAHCA)是对层次聚类算法的一种改进,它能够自动调整参数,以寻求更优的聚类结果。实验结果验证了该算法的可行性,特别是在符号属性聚类方面有着较好的聚类性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号