chapter10ClusBasic.ppt

chapter10ClusBasic

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Similarity Defined by SimTree Path-based node similarity simp(n7,n8) = s(n7, n4) x s(n4, n5) x s(n5, n8) Similarity between two nodes is the average similarity between objects linked with them in other SimTrees Adjust/ ratio for x = n1 n2 n4 n5 n6 n3 0.9 1.0 0.9 0.8 0.2 n7 n9 0.3 n8 0.8 0.9 Similarity between two sibling nodes n1 and n2 Adjustment ratio for node n7 Average similarity between x and all other nodes Average similarity between x’s parent and all other nodes * LinkClus: Efficient Clustering via Heterogeneous Semantic Links Method Initialize a SimTree for objects of each type Repeat until stable For each SimTree, update the similarities between its nodes using similarities in other SimTrees Similarity between two nodes x and y is the average similarity between objects linked with them Adjust the structure of each SimTree Assign each node to the parent node that it is most similar to For details: X. Yin, J. Han, and P. S. Yu, “LinkClus: Efficient Clustering via Heterogeneous Semantic Links”, VLDB06 * Initialization of SimTrees Initializing a SimTree Repeatedly find groups of tightly related nodes, which are merged into a higher-level node Tightness of a group of nodes For a group of nodes {n1, …, nk}, its tightness is defined as the number of leaf nodes in other SimTrees that are connected to all of {n1, …, nk} n1 1 2 3 4 5 n2 The tightness of {n1, n2} is 3 Nodes Leaf nodes in another SimTree * Finding Tight Groups by Freq. Pattern Mining Finding tight groups Frequent pattern mining Procedure of initializing a tree Start from leaf nodes (level-0) At each level l, find non-overlapping groups of similar nodes with frequent pattern mining Reduced to g1 g2 {n1} {n1, n2} {n2} {n1, n2} {n1, n2} {n2, n3, n4} {n4} {n3, n4} {n3, n4} Transactions n1 1 2 3 4 5 6 7 8 9 n2 n3 n4 The tightness of a group of nodes is the

文档评论(0)

1亿VIP精品文档

相关文档