- 15
- 0
- 约4.27千字
- 约 18页
- 2017-03-09 发布于上海
- 举报
Webscale Information Extraction in KnowItAll在KnowItAll网络规模的信息提取
Web-scale Information Extraction in KnowItAll Oren Etzioni etc. U. of Washington WWW’2004 Outline Motivation System Architecture Detail Techniques Search Engine Interface Extractor Probabilistic Assessment Experimental Result Future Work Conclusion Motivation Why Web-scale Information Extraction? Web is the largest knowledge base. Extracting information by searching the web is not easy: list the cities in the world whose population is above 400,000; humans who has visited space. Unless we find the “right” document, this work could be tedious, error-prone process of piecemeal search. Motivation (2) Previous Information Extraction Works Supervised Learning Difficult to scale to the web the diversity of the web the prohibitive cost of creating an equally diverse set of hand-tagged documents Weakly Supervised and Bootstrap Need domain-specific seeds Learn rule from seeds, and then vice versa KnowItAll Domain-Independent Use Bootstrap technique System Architecture 4 Components Data Flow System Architecture System Work Flow System Architecture System Work Flow System Architecture Search Engine Interface Distribute jobs to different Search Engines Extractor Rule Instantiation Information Extraction Accessor Discriminator Phrases Construction Access of Information Search Engine Interface Metaphor: Information Food Chain Search Engine ? Herbivore KnowItAll ? Carnivore Why build on top of search engine? No need to duplicate existing work Low cost/time/effort Query Distribution Make sure not to overload search engines Extractor Extraction Template Examples NP1 {“,”} “such as” NPList2 NP2 {“,”} “and other” NP2 NP1 {“,”} “is a” NP2 All are domain-independent! Extractor (2) Noun phrase analysis A. “China is a country in Asia” B. “Garth Brooks is a country singer” In A, the word “country” is the head of a simple noun phrase. In B, the word “country” is not the head of a simple noun phrase. So, China is indeed a country while Garth Brooks is not a
您可能关注的文档
- Vivid Verbs Lebanon生动的动词黎巴嫩.ppt
- VIVI. La Misa, paso a paso Liturgia de la Palabra y .ppt
- Viva Las Vegas marc拉斯维加斯马克.merlins.ppt
- Vladimir N弗拉迪米尔. Vapnik’s The Nature of Statistical .ppt
- VO2 Max iws2VO2 max iws2.collin.ppt
- VLSI Floorplanning with Boundary Constraints Based on CornerVLSI布图规划基于角点边界约束.ppt
- VOCAB 1A Wikispaces词汇表1A wiki空间.ppt
- Vocabulaire 8词汇 webmail.hcboe.net.ppt
- vocabulario de actividades de tiempo de invierno By 词汇的冬季活动的时间由.ppt
- Vocabulario 55的词汇.1 mohrspanish.wikispaces.ppt
- WebVoyge with a Wrapper Michael Doranwebvoy和225GE与包装米迦勒多兰.ppt
- WECC Governor Responsive Reserve DataWECC州长储备响应数据.ppt
- Wedding Wind婚礼风.ppt
- Weddings Indonesia and the United States印度尼西亚和美国的婚礼.ppt
- Wednesday, September 5 Homestead星期三9月5日宅基地.ppt
- Weed Control in Grain Sorghum Kansas State University高粱堪萨斯州立大学杂草控制.ppt
- Wee Willy Winkey Killeanps小威利威威 killeanps.ppt
- Week 10 Second Language Acquisition10周二语习得.ppt
- WEED FLORA AND WEED DISTRIBUTION IN GRAPES葡萄杂草区系及杂草分布.ppt
- Week 1 Introduction Temple Fox MIS介绍了1周庙的狐狸.ppt
最近下载
- 2025【财务共享模式在窖型白酒企业的应用实例分析—以泸州老窖为例13000字】.docx VIP
- 采血后预防淤青的按压方式.pptx
- 湖南科技职业学院2026年单独招生考试文化素养测试大纲及样题.pdf VIP
- 哈雷sportster车系电路工作原理与维修(四).pdf VIP
- 黑龙江护理单招试题及答案.docx VIP
- 2025年黑龙江省职业教育春季高考考试招生语文全真模拟卷(一)(原卷版).docx VIP
- 一种毛钩藤碱在治疗血小板减少症药物中的应用.pdf VIP
- 市场部门市场营销专员工作手册(标准版).doc VIP
- 2022森林防火基础设施设备建设规范.docx VIP
- DB21T1823-2010 既有居住建筑节能改造技术规程.pdf VIP
原创力文档

文档评论(0)