两步聚类BIRCH算法.ppt

K-Means Example Step.2 x x x new center after 1st iteration new center after 1st iteration new center after 1st iteration * K-Means Example Step.3 new center after 2nd iteration new center after 2nd iteration new center after 2nd iteration * Main Techniques (2) Hierarchical Clustering Multilevel clustering: level 1 has n clusters ? level n has one cluster, or upside down. Agglomerative HC: starts with singleton and merge clusters (bottom-up). Divisive HC: starts with one sample and split clusters (top-down). Dendrogram * Agglomerative HC Example Nearest Neighbor Level 2, k = 7 clusters. * Nearest Neighbor, Level 3, k = 6 clusters. * Nearest Neighbor, Level 4, k = 5 clusters. * Nearest Neighbor, Level 5, k = 4 clusters. * Nearest Neighbor, Level 6, k = 3 clusters. * Nearest Neighbor, Level 7, k = 2 clusters. * Nearest Neighbor, Level 8, k = 1 cluster. * Remarks Partitioning Clustering Hierarchical Clustering Time Complexity O(n) O(n2log n) Pros Easy to use and Relatively efficient Outputs a dendrogram that is desired in many applications. Cons Sensitive to initialization; bad initialization might lead to bad results. Need to store all data in memory. higher time complexity; Need to store all data in memory. * Introduction to BIRCH Designed for very large data sets Time and memory are limited Incremental and dynamic clustering of incoming objects Only one scan of data is necessary Does not need the whole data set in advance Two key phases: Scans the database to build an in-memory tree Applies clustering algorithm to cluster the leaf nodes * Similarity Metric(1) Given a cluster of instances , we define: Centroid: Radius: average distance from member points to centroid Diameter: average pair-wise distance within a cluster * Similarity Metric(2) centroid Euclidean distance: centroid Manhattan distance: average inter-cluster: average intra-cluster: variance increase: * Clustering Feature The Birch algorithm builds a dendrogram called clustering featu

文档评论(0)

1亿VIP精品文档

相关文档