K均值算法的K值优化研究和应用.pdfVIP

下载本文档

43
0
约1.57万字
约 4页
2018-05-09 发布于福建
举报

K均值算法的K值优化研究和应用.pdf

总第289期计算机与数字工程 Vo1．41No．11 2013年第儿期 Computer DigitalEngineering 1713 K均值算法的K值优化研究和应用孙镇江梁永全樊建聪马远坤梁天一 (山东科技大学信息科学与工程学院青岛 266590) 摘要在数据挖掘领域，K均值算法是一种经典的聚类算法，但 K值需要事先设定，并且 K均值算法的性能易受 K值的影响，随着大数据时代的到来，用户很难准确确定K值。论文结合计算机操作系统中内存分配的循环首次适应算法和基于密度聚类的方法，以及分布式估计算法 (EDAs)提出了基于密度的循环首次适应K值优化算法来优化K值，用理论验证算法的可行性，并运用K均值中文文本聚类验证算法的有效性。关键词 K均值；分布式估计 (EDAs)；阈值；密度中图分类号 TP181 DOI：10．3969／j．issn1672—9722．2013．11．001 OptimizationStudyandApplicationonK ValueofK-meansAlgorithm SUNZhenjiang LIANGYongquan FANJiancong MAYuankun LIANGTianyi (CollegeofInformationScienceandEngineering，ShandongUniversityofScienceandTechnology，Qingdao 266590) Abstract TheK—meansalgorithm isaclassicalclusteringalgorithm inthefieldofdatamining．ButtheK valueneedstobesetinad— vance，andtheperformanceofK—meansalgorithm issusceptibletotheimpactoftheK value．W iththeadventoftheeraofbigdata，theuser isdifficulttoaccuratelydeterminetheK value．BeaimedatoptimizingtheK value，thispaperputsforwardaalgorithm namedtheDensity- basedandNextfitoftheK valueoptimizationalgorithm ，whichcombinesthenextfitofmemoryallocationincomputeroperatingsystem and thedensity-basedclusteringmethod，asweIlasthedistributedestimationalgorithms(EDAs)．Thetheoryisusedtoverifythefeasibilityof thealgorithm andverifytheeffectivenessofthealgorithmbyusingoftheK—meansChinesetextclustering． KoyWords K—means，estimationofdistributionalgorithms(EDAs)，threshold，density ClassNumtmr TP181 知道真实的K值，这样，就会影响K均值算法的效果。 1 引言本论文主要研究K均值的第2)缺点，关于第2)缺点，许多聚类 [】]是一个把数据对象集划分成多个组或簇的过学者已经做了大量的研究工作：BezdekJC[。]提出基于样本隶程。随着大数据时代的到来，聚类作为一种数据挖掘工具属度的划分概念，聚类个数K通过满足min(Jm(U，c))获得，其已经根植于很多应用领域，如生物学、安全、商务智

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

K均值算法的K值优化研究和应用.pdfVIP