计算机应用研究 Application Research of Computers 计算机应用研究 Application Research of Computers 基金项目:黑龙江省教育厅科学技术研究项目 (No。 作者简介: 万静(1972-),女,教授,硕导, 博士,主要研究方向为数据库理论及应用;张义(1989-),男,硕士研究生,主要研究方向为空间数据挖掘;何云斌(1972-),男,教授,研究生导师,主要研究方向:数据库理论与应用、时空数据库、嵌入式系统;李松(1977-),男,副教授,博士,主要研究方向:空间数据库理论及应用( HYPERLINK mailto:837734463@ 837734463@) 基于KD-树和K-means动态聚类方法研究 万 静,张 义,何云斌,李 松 (哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080) 摘 要:针对传统K-means聚类算法对初始中心点比较敏感,易陷入局部最优,首先提出基于KD-树的初始聚类中心点选取方法。该方法通过建立KD-树将数据集分割成矩形单元,计算每个矩形的矩形单元中心、矩形单元密度,并将计算所得矩形单元密度降序排列,通过选取前k个矩形单元中心作为初始聚类中心可有效克服传统算法对初始中心点的敏感。此外,针对传统K-means聚类算法不能有效处理动态数据聚类的问题,进一步提出了KDTK-means聚类算法。算法对基于KD-树优化选取的k个聚类中心和增量数据建立新的KD-树,利用近邻搜索策略将增量数据分配到相应的聚类簇中,完成聚类。实验结果表明,与传统的K-means聚类算法相比,提出的基于KD-树优化初始聚类中心点选取的算法能够有效选取具有代表性的初始中心,提出的KDTK-means聚类算法能够快速高效的处理增量数据聚类问题。 关键词: K-means聚类; KD-树; 增量聚类; 初始聚类中心 中图分类号: TP1 Dynamic clustering algorithm based on KD-tree and K-means method Wan Jing, Zhang Yi, He Yunbin, Li Song (School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China) Abstract: The traditional K-means algorithm is sensitive to the initial center and easy to trap in local optimums. For overcoming this disadvantages, this paper proposes a new method based on KD-tree.The new method firstly divides the data into a series rectangular units by using KD-tree, and sorts the rectangular units by the density, then chooses the k data objects with high density as the initial clustering centers. The experimental result shows that the proposed method has the weak dependence on initial data and better quality of clustering. Meanwhile, since the traditional K-means algorithm can not effectively organize the dynamic clustering, a new improved algorithm called KDTK-means algorithm is proposed. The KDTK-means algorithm builds a new KD-tree by the incremental data and the optimized k initial centers, and then assigns each incremental data to corresponding cluster by the strategy of nearest neighbor searching..The experiment


