Web Crawling.pptVIP

  • 18
  • 0
  • 约1.55万字
  • 约 52页
  • 2017-02-15 发布于北京
  • 举报
Web Crawling.ppt

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Concurrent crawlers Can use multi-processing or multi-threading Each process or thread works like a sequential crawler, except they share data structures: frontier and repository Shared data structures must be synchronized (locked for concurrent writes) Speedup of factor of 5-10 are easy this way * Outline Motivation and taxonomy of crawlers Basic crawlers and implementation issues Universal crawlers Crawler ethics and conflicts * Universal crawlers Support universal search engines Large-scale Huge cost (ne

文档评论(0)

1亿VIP精品文档

相关文档