stanford大学-大数据挖掘-PageRank-13.ppt

stanford大学-大数据挖掘-PageRank-13.ppt

CS345 Data Mining Link Analysis Algorithms Page Rank Link Analysis Algorithms Page Rank Hubs and Authorities Topic-Specific Page Rank Spam Detection Algorithms Other interesting topics we won’t cover Detecting duplicates and mirrors Mining for communities Ranking web pages Web pages are not equally “important” v Inlinks as votes has 23,400 inlinks has 1 inlink Are all inlinks equal? Recursive question! Simple recursive formulation Each link’s vote is proportional to the importance of its source page If page P with importance x has n outlinks, each link gets x/n votes Page P’s own importance is the sum of the votes on its inlinks Simple “flow” model The web in 1839 Solving the flow equations 3 equations, 3 unknowns, no constants No unique solution All solutions equivalent modulo scale factor Additional constraint forces uniqueness y+a+m = 1 y = 2/5, a = 2/5, m = 1/5 Gaussian elimination method works for small examples, but we need a better method for large graphs Matrix formulation Matrix M has one row and one column for each web page Suppose page j has n outlinks If j ! i, then Mij=1/n Else Mij=0 M is a column stochastic matrix Columns sum to 1 Suppose r is a vector with one entry per web page ri is the importance score of page i Call it the rank vector |r| = 1 Example Eigenvector formulation The flow equations can be written r = Mr So the rank vector is an eigenvector of the stochastic web matrix In fact, its first or principal eigenvector, with corresponding eigenvalue 1 Example Power Iteration method Simple iterative scheme (aka relaxation) Suppose there are N web pages Initialize: r0 = [1/N,….,1/N]T Iterate: rk+1 = Mrk Stop when |rk+1 - rk|1 ? |x|1 = ?1≤i≤N|xi| is the L1 norm Can use any other vector norm e.g., Euclidean Power Iteration Example Random Walk Interpretation Imagine a random web surfer At any time t, surfer is on some page P At time t+1, the surfer follows an outlink from P uniformly at random Ends up on some page Q linked from P Proc

文档评论(0)

1亿VIP精品文档

相关文档