- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
Random Projection for igh Dimensional Data Clustering A 高维数据聚类随机投影
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu Contents Motivation Random projection and the cluster ensemble approach Experimental results Conclusion Motivation High dimensionality poses two challenges for unsupervised learning The presence of irrelevant and noisy features can mislead the clustering algorithm. In high dimensions, data may be sparse, making it difficult to find any structure in the data. Two basic approaches to reduce the dimensionality Feature subset selection; Feature transformation-PCA, random projection. Motivation Random projection Advantage A general data reduction technique; Has been shown to have special promise for high dimensional data clustering. Disadvantage Highly unstable. Different random projections may lead to radically different clustering results. Idea Aggregate multiple runs of clusterings to achieve better clustering performance. A single run of clustering consists of applying random projection to the high dimensional data and clustering the reduced data using EM. Multiple runs of clustering are performed and the results are aggregated to form an n?n similarity matrix. An agglomerative clustering algorithm is then applied to the matrix to produce the final clusters. A single run Random projection: X’=X ? R X’: n ? d’, reduced-dimension data set X : n ? d , high-dimensional data set R: d ? d’, which is generated by first setting each entry of the matrix to a value drawn from an i.i.d N(0,1) distribution and then normalizing the columns to unit length. EM clustering Aggregating multiple clustering results The probability that data point i belongs to each cluster under the model ?: The probability that data point i and j belongs to the same cluster under the model ?: Pij forms a “similarity” matrix. Producing final clusters Experimental results Evaluation Criteria Conditional Entropy (CE): measures the uncertai
您可能关注的文档
- PEP三年级英语上册nit 5复习课件.ppt
- Percent Welcome to Clark College Home of the enguins欢迎克拉克学院的企鹅的家.ppt
- PERCENT COMPOSITION My Clas Sites百分组成的我的班级网站.ppt
- Percent Compositon Power Point百分组成的功率点.ppt
- Percent Composition by Mass BCHSRegentsChemistry按质量百分比组成 bchsregentschemistry.ppt
- Percent Equtions SchoolRack分方程SchoolRack.ppt
- Percent of Change Jefferson County杰佛逊县的变化百分比.ppt
- Percent Acreage of Lowlan Conifers低地针叶林面积百分比.ppt
- Percent Inrease or Decreaseppt SREagles增加或decreaseppt sreagles.ppt
- Percent Composition 百分组成.ppt
- Random Wlk based Collaborative Filtering using Implicit …基于随机行走的协同过滤使用隐式….ppt
- RANDOM SAMPLING UMBC An Honors Universty In Maryland随机抽样马里兰大学在马里兰州的荣誉.ppt
- RAS Rental Accommodtion Scheme Limerick City CouncilRAS出租住宿计划利默里克市议会.doc
- Ranom Variables & Disributions随机变量及disributions.ppt
- Random waypoint mobilitymodel National Institute of 随机点移动模型的国家研究所.ppt
- Random Walks on Graphs Home ERNET图的随机游动的家庭网络.ppt
- Rates & Costs Schedule 2012 12 Tonne Rigid Vehicle利率&;成本计划2012,12吨的刚性车.doc
- Rational Unified Process Dublin Institute of Tchnology HomeRational统一过程都柏林理工学院家.ppt
- Quasar Heating in Stucture Formation类星体的加热结构的形成.ppt
- RE MOTO VEHICLE ACCIDENT CLAIMS PRACTITIONERS在机动车事故索赔者.doc
文档评论(0)