

  1. 1、本文档共95页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。

本章小结 介绍了问答系统的概念及基本框架 介绍了问答系统的问题分类,复述等技术 介绍了自动文摘的基本原理 三类TREC2003方法在TREC比赛中的成绩比较 IR + IE IR + Pattern Match IR + IE + NLP 代表系统 系统[19] 系统[28, 29] 系统[4, 31] 代表方法在各届TREC中的名次 2000 - - 1 2001 - 1 2 2002 3 2 1 2003 2 - 1 问答系统实例 AskMSR: Shallow approach In what year did Abraham Lincoln die? Ignore hard documents and find easy ones AskMSR: Details 1 2 3 4 5 Step 1: 重写queries 直觉: 用户的问题常常和包含答案的句子在句法上很相似 Where is the Louvre Museum located? The Louvre Museum is located in Paris Who created the character of Scrooge? Charles Dickens created the character of Scrooge. Query 重写 将问题分类7类 Who is/was/are/were…? When is/did/will/are/were …? Where is/are/were …? a. Category-specific transformation rules eg “For Where questions, move ‘is’ to all possible locations” “Where is the Louvre Museum located” ? “is the Louvre Museum located” ? “the is Louvre Museum located” ? “the Louvre is Museum located” ? “the Louvre Museum is located” ? “the Louvre Museum located is” b. Expected answer “Datatype” (eg, Date, Person, Location, …) When was the French Revolution? ? DATE Hand-crafted classification/rewrite/datatype rules (Could they be automatically learned?) Query重写 – 权值 有一些重写结果比另一些重写结果更可靠 +“the Louvre Museum is located” Where is the Louvre Museum located? Weight 5 if we get a match, it’s probably right +Louvre +Museum +located Weight 1 Lots of non-answers could come back too 卢浮宫 Step 2: 调用搜索引擎 将所有重写后的问题提交给搜索引擎 找到top N答案 (100?) 为了提高速度,可以只依赖 “snippets”, 而不是实际的全文 片断 Step 3: 利用N-Grams Unigram, bigram, trigram, … N-gram: 在一个序列中 N 个相邻的term 例如:“Web Question Answering: Is More Always Better” Unigrams: Web, Question, Answering, Is, More, Always, Better Bigrams: Web Question, Question Answering, Answering Is, Is More, More Always, Always Better Trigrams: Web Question Answering, Question Answering Is, Answering Is More, Is More Always, More Always Betters 利用 N-Grams Simple: 列举全部在所有检索到的snippets中的全部 N-grams (N=1,2,3 say) 一个n-gram的权重决定于出现次数等 举例: “Who


ailuojue + 关注


