Computational Linguistics 计算语言学.ppt

下载文档 降价啦

62
0
约1.76万字
约 105页
2018-01-04 发布于浙江
举报
版权申诉
保障服务

Computational Linguistics 计算语言学.ppt

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

Computational Linguistics 计算语言学

9/13/1999 Intro to NLP JHU CS 600.465/Jan Hajic Computational Linguistics计算语言学史晓东 Xiamen University mandel@ 46/course_cl.html outline n元语言模型平滑算法（1）平滑算法（2）高级话题 Statistical LM Toolkit 看几幅漫画 What is a Language Model？ A language model is a probability distribution over word sequences P(“And nothing but the truth”) ?? 0.001 P(“An nuts sing on de roof”) ? 0 语言模型对每个词串（句子）都赋予一个概率。对合法串，概率高；不合法的，概率小。 The sum of probabilities of all word sequences has to be 1. A bad language model 在听不清的情况，每个人试图猜测说话者说的话（专业术语叫解码），选取自己认为最合理的。证人的语言模型很糟糕。说出来的话不像英语（DE）。估计听不大懂英语我的汉语语言模型也很糟糕记得小时候的歌：“走在乡间的小路上”，？？的老牛是我同伴暮归魔鬼磨谷？？概率模型的好处语言老师说，“只能这样说！”（prescriptive）语言模型说，“这样说比较好！”，“大家都这样说！” 连续模型－离散模型（我们的讨论对象）在许多应用中语言模型不可或缺 Speech recognition Handwriting recognition Optical character recognition（OCR） Spelling correction Information retrieval Machine translation 比如语音识别如何计算概率：Chain rule P(“And nothing but the truth”) = P(“And”) ?P(“nothing|and”) ? P(“but|and nothing”) ? P(“the|and nothing but”) ? P(“truth|and nothing but the”) Markov approximation Assume each word depends only on the limited local context, e.g. on previous two words. This is called trigram models P(“the|… whole truth and nothing but”) ? P(“the|nothing but”) P(“truth|… whole truth and nothing but the”) ? P(“truth|but the”) With Markov Assumption Caveat The formulation P(Word| Some fixed prefix) is not really appropriate in many applications. It is only the case if we’re dealing with real time speech where we only have access to prefixes. But if we’re dealing with text we already have the right and left contexts. There’s no a priori reason to stick to left contexts. 对w1w2w3, 可以用P(w3|w1w2)，也可以用P(w2|w