Webscale Information Extraction in KnowItAll在KnowItAll网络规模的信息提取.pptVIP

  • 15
  • 0
  • 约4.27千字
  • 约 18页
  • 2017-03-09 发布于上海
  • 举报

Webscale Information Extraction in KnowItAll在KnowItAll网络规模的信息提取.ppt

Webscale Information Extraction in KnowItAll在KnowItAll网络规模的信息提取

Web-scale Information Extraction in KnowItAll Oren Etzioni etc. U. of Washington WWW’2004 Outline Motivation System Architecture Detail Techniques Search Engine Interface Extractor Probabilistic Assessment Experimental Result Future Work Conclusion Motivation Why Web-scale Information Extraction? Web is the largest knowledge base. Extracting information by searching the web is not easy: list the cities in the world whose population is above 400,000; humans who has visited space. Unless we find the “right” document, this work could be tedious, error-prone process of piecemeal search. Motivation (2) Previous Information Extraction Works Supervised Learning Difficult to scale to the web the diversity of the web the prohibitive cost of creating an equally diverse set of hand-tagged documents Weakly Supervised and Bootstrap Need domain-specific seeds Learn rule from seeds, and then vice versa KnowItAll Domain-Independent Use Bootstrap technique System Architecture 4 Components Data Flow System Architecture System Work Flow System Architecture System Work Flow System Architecture Search Engine Interface Distribute jobs to different Search Engines Extractor Rule Instantiation Information Extraction Accessor Discriminator Phrases Construction Access of Information Search Engine Interface Metaphor: Information Food Chain Search Engine ? Herbivore KnowItAll ? Carnivore Why build on top of search engine? No need to duplicate existing work Low cost/time/effort Query Distribution Make sure not to overload search engines Extractor Extraction Template Examples NP1 {“,”} “such as” NPList2 NP2 {“,”} “and other” NP2 NP1 {“,”} “is a” NP2 All are domain-independent! Extractor (2) Noun phrase analysis A. “China is a country in Asia” B. “Garth Brooks is a country singer” In A, the word “country” is the head of a simple noun phrase. In B, the word “country” is not the head of a simple noun phrase. So, China is indeed a country while Garth Brooks is not a

文档评论(0)

1亿VIP精品文档

相关文档