K-均值聚类算法的MapReduce模型实现.pdfVIP

下载本文档

15
0
约1.4万字
约 6页
2015-08-09 发布于湖北
举报

K-均值聚类算法的MapReduce模型实现.pdf

第38 卷第3 期长春理工大学学报（自然科学版） Vol.38 No.3 2015 年6 月 Journal of Changchun University of Science and Technology （Natural Science Edition ） Jun.2015 K-均值聚类算法的MapReduce模型实现王鹏，王睿婕（长春理工大学计算机科学技术学院，长春 130022）摘要：针对日益严峻的大数据处理时间长、执行速率低等问题，通过深入分析，提出了一种提高大规模数据聚类效率的方法。以K-均值聚类算法为原型，利用MapReduce模型在大规模数据处理方面的优势，对原有算法进行并行化改进，设计出一种基于Hadoop分布式云平台的K-均值聚类MapReduce模型。应用此模型，对淘宝用户仿真数据进行聚类试验，试验结果表明，对K-均值聚类算法的MapReduce模型实现后，性能优于原算法性能，缩短了聚类时间，提高了聚类效率，特别适于对海量数据进行聚类处理。关键词：大数据；MapReduce模型；K-均值聚类算法中图分类号： TP391 文献标识码：A 文章编号：1672-9870（2015）03-0120-05 The K-means Clustering Algorithm Research Based on the MapReduce Model WANGPeng，WANGRuijie （SchoolofComputerScienceandTechnology，ChangchunUniversityofScienceandTechnology，Changchun130022） Abstract：Increasingly grim for a long time big data processing，and low execution rate，through in-depth analysis， this paper presents a method to improve the efficiency of large-scale data clustering methods.K- means clustering algo- rithm to prototype，utilizing the advantages of MapReduce model for large-scale data processing，the original algorithm parallelization improvements designed K- means clustering algorithm model based on Hadoop MapReduce distributed cloud platform .Using this model，the simulation data for Taobao users to cluster trial，which demonstrated the feasibili- tyofthismethod，shortening the clustering time，especiallysuitablefor massivedataclusteringprocess. Keywords：bigdata；MapReduce programming model；K-meansclusteringalgorithm 随着计算机网络通信技术的迅速发展，如今已型实现。MapReduce是Google提出的可在Hadoop ［6］经进入了大数据时代。大体上讲，大数据（Big Da- 分布式云集群上并行处理海量数据集的编程模型， ta）是指在一定时间内，不能够使用常规计算机和软具有良好的容错性与扩展性，Map和

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

K-均值聚类算法的MapReduce模型实现.pdfVIP