一种基于降维思想的K均值聚类方法-湖南城学院学报自然科学版.PDF

下载文档

2
0
约3.03万字
约 8页
2019-04-12 发布于天津
举报
版权申诉
保障服务

一种基于降维思想的K均值聚类方法-湖南城学院学报自然科学版.PDF

1、本文档共8页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

第26卷第 1期湖南城市学院学报（自然科学版） Vol. 26 No. 1 20 17年 1月 Journal of Hunan City University （N atural Science ） J an. 20 17 一种基于降维思想的K均值聚类方法徐勇，陈亮 (安徽财经大学管理科学与工程学院，安徽蚌埠 233000) 摘要：维数灾难是数据挖掘过程中的重要问题．为解决K均值聚类过程中的维数灾难问题，本文以欧式距离作为距离的计算方式，采用主成分(PCA)方法对数据源进行降维，实验获得在不同数据规模、特征下的K 均值方法的聚类时间．设置对照组对时间、差异性、迭代次数三个方面进行比较．通过实验总结出，数据源的大小与维数共同影响降维聚类的时间效益：数据数量越大，降维聚类的时间收益越大，数据维数越大，降维聚类的时间收益越小；数据源的线性程度影响降维聚类与非降维聚类结果的差异大小：数据线性程度越高，两次聚类结果差异性越小．反之，差异性越大；K均值算法收敛速度很快，两次聚类都能在Sqrt(Row)次数内完成程序的收敛．关键词：聚类算法；降维；K均值；主成分分析中图分类号：N32 文献标识码：A doi:10.3969/j.issn.1672-7304.2017.01.12 文章编号：1672–7304(2017)01–0054–08 A K-means Clustering Method Based on Dimension Reduction XU Yong , CHEN Liang (School of management science and Engineering, Anhui University of Finance and Economics, Bengbu, Anhui 233000, China) Abstract: The curse of dimensionality is an important problem in the process of data mining. In order to solve the dimension disaster problem in K means clustering process, this paper uses the Euclidean distance as the way of the distance calculation, employing the principal component method (PCA) of the data source to reduce dimensionality, to acquire clustering time of the K means method in different scales and the characteristics of data by the experiment. And compare both the time and difference with the control group. It is concluded from the experiments that the size and the dimension of the data source combined effect of time be