Vincent Blondel and Paul Van Dooren CESAME, 文森特布朗德和保罗Van Dooren塞莎.pptVIP

  • 4
  • 0
  • 约9.64千字
  • 约 33页
  • 2017-03-09 发布于上海
  • 举报

Vincent Blondel and Paul Van Dooren CESAME, 文森特布朗德和保罗Van Dooren塞莎.ppt

Vincent Blondel and Paul Van Dooren CESAME, 文森特布朗德和保罗Van Dooren塞莎

Web searching and graph similarity Vincent Blondel and Paul Van Dooren* CESAME, Universite Catholique de Louvain http://www.inma.ucl.ac.be/ * Thanks to P. Sennelart GAMM, 2003 The web graph Nodes = web pages, Edges = hyperlinks between pages 3 billion (Google searched 3,083,324,625 webpages in 2002) Average of 7 outgoing links The web graph Nodes = web pages, Edges = hyperlinks between pages 3 billion (Google searched 3,083,324,625 webpages in 2002) Average of 7 outgoing links Growth of a few % every month Outline 1. Structure of the web 2. Methods for searching the web (Google PageRank and Kleinberg Hits) 3. Similarity in graphs 4. Application to synonym extraction (Blondel-Sennelart) Structure of the web Experiments : two crawls over 200 million pages in 1999 found a giant strongly connected component (core) Contains most prominent sites It contains 30% of all pages Average distance between nodes is 16 Small world Ref : Broder et al., Graph structure in the web, WWW9, 2000 The web is a bowtie Ref : The web is a bowtie, Nature, May 11, 2000 In- and out-degree distributions Power law distribution : number of pages of in-degree n is proportional to 1/n2.1 (Zipf law) A score for every page The score of a page is high if the page has many incoming links coming from pages with high page score One browses from page to page by following outgoing links with equal probability. Score = frequency a page is visited. A score for every page The score of a page is high if the page has many incoming links coming from pages with high page score One browses from page to page by following outgoing links with equal probability. Score = frequency a page is visited. … some pages may have no outgoing links … many pages have zero frequency PageRank : teleporting random score The surfer follows a path by choosing an outgoing link with probability p/dout(i)

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档