基于图的聚类算法研究+文献综述.docVIP

下载本文档

135
0
约3.94千字
约 11页
2017-08-30 发布于浙江
举报
版权申诉

基于图的聚类算法研究+文献综述.doc

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

基于图的聚类算法研究文献综述

基于图的聚类算法研究+文献综述摘要聚类算法就是将数据集中差异性小的点聚合起来，形成若干个簇，每个簇都反映了它们的个性，是数据挖掘技术的关键步骤。图方法将数据用结点表示，数据之间的邻近度用对应结点之间边的权值表示，从而将图的重要特性（如稀疏化邻近图）应用到聚类分析中，提高聚类算法效率。本论文通过Chameleon算法具体分析图方法在聚类算法中的应用，并通过与K-means算法的实验对比体现基于图的聚类算法能够发现任意形状，大小的数据簇的特点。同时总结在实验过程中出现的Chameleon算法缺点与不足。8122 关键词：聚类图 chameleon K-means 稀疏化毕业设计说明书（论文）外文摘要 TitleResearch of clustering algorithm graph-based Abstract Clustering algorithm is the algorithmon which puts the points of data-centralizing and little-difference together, and forms a number of clusters, each cluster reflect their own personality. Clustering algorithm is the key step in the data mining technology. The Graph Theory Method, which represents the data with nodes, and represents adjacent degrees between the data with the weights of the edge between the corresponding nodes, and thus applys the important characteristics of figure (such as the sparse adjacent chart) to the clustering analysis, improves the efficiency of the clustering algorithm. In this article,I will make a concrete analysis on how could The Graph Theory Method be applied into Clustering algorithm with the Chameleon algorithm as in specific method. Besides that,I will put the K-means algorithm as a experimental comparision to show the characteristics of the based-on-graph theory clustering algorithm, which can find any kind of shapes or sizes of the data clusters . I will also summarize some disadvantages and shortcomings appeared in the process of the Chameleon algorithm experimentals. Keywords： ClusteringgraphChameleon K-means Sparse 1.1.1 数据挖掘的内容数据挖掘源于数学、计算机、经济管理等多门学科，一个清楚的分类就显得尤为重要[ ]。数据挖掘大致可以分为以下几个方面：分类、聚类、关联规则[ ]。分类方法[ ][ ][ ]是将数据集按照某个标准分成若干个类，并告诉使用者某个数据属于某个类。它需要有一个训练样本集，将测试集与样本集做比较，分类最终将测试集中的数据划分到样本集的每一个类中。聚类方法[ ][ ]是将数据集中，差异性小的点聚合起来，形成若干个簇，每个簇都反映了它们的个性。这个方法不需要有训练集，使用者只要做一些简单的设置，如聚类个数或是聚类度量标准等，就可以得到聚类结果。关联规则[ ]通过扫描整个数据集，将经常出现的一对或一组数据体现出来。这种方法主要基于统计的方法，通过设定阈值，从海量数据中得到符合阈值要求的组合，组合中出现的元素表明它们相互之间有关系。 1.1.2 数据挖掘的意义数据挖掘就是从大量繁杂的数据中获取隐含在其中的信息，比如说对顾客分类，聚类，欺诈甄别，潜在顾客识别等，现在应用领域很广的