Lecture 4 The MaximumEntropy Stewpot讲座34最大熵炖锅.ppt

下载文档 降价啦

2
0
约1.38万字
约 27页
2018-06-19 发布于福建
举报
版权申诉
保障服务

Lecture 4 The MaximumEntropy Stewpot讲座34最大熵炖锅.ppt

1、本文档共27页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

Lecture 4 The MaximumEntropy Stewpot讲座34最大熵炖锅

600.465 - Intro to NLP - J. Eisner The Maximum-Entropy Stewpot Probability is Useful We love probability distributions! We’ve learned how to define use p(…) functions. Pick best output text T from a set of candidates speech recognition (HW2); machine translation; OCR; spell correction... maximize p1(T) for some appropriate distribution p1 Pick best annotation T for a fixed input I text categorization; parsing; part-of-speech tagging … maximize p(T | I); equivalently maximize joint probability p(I,T) often define p(I,T) by noisy channel: p(I,T) = p(T) * p(I | T) speech recognition other tasks above are cases of this too: we’re maximizing an appropriate p1(T) defined by p(T | I) Pick best probability distribution (a meta-problem!) really, pick best parameters ?: train HMM, PCFG, n-grams, clusters … maximum likelihood; smoothing; EM if unsupervised (incomplete data) Bayesian smoothing: max p(?|data) = max p(?, data) =p(?)p(data|?) Probability is Flexible We love probability distributions! We’ve learned how to define use p(…) functions. We want p(…) to define probability of linguistic objects Trees of (non)terminals (PCFGs; CKY, Earley, pruning, inside-outside) Sequences of words, tags, morphemes, phonemes (n-grams, FSAs, FSTs; regex compilation, best-paths, forward-backward, collocations) Vectors (decis.lists, Gaussians, na?ve Bayes; Yarowsky, clustering/k-NN) We’ve also seen some not-so-probabilistic stuff Syntactic features, semantics, morph., Gold. Could be stochasticized? Methods can be quantitative data-driven but not fully probabilistic: transf.-based learning, bottom-up clustering, LSA, competitive linking But probabilities have wormed their way into most things p(…) has to capture our intuitions about the ling. data An Alternative Tradition Old AI hacking technique: Possible parses (or whatever) have scores. Pick the one with the best score. How do you define the score? Completely ad hoc! Throw anything you want into the stew Add a bonus for this, a