- 1、本文档共8页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Learning a spelling error model from search query logs
Learning a Spelling Error Model from Search Query Logs
Farooq Ahmad
Department of Electrical and
Computer Engineering
University of Alberta
Edmonton, Canada
farooq@ualberta.ca
Grzegorz Kondrak
Department of Computing Science
University of Alberta
Edmonton, Canada
kondrak@cs.ualberta.ca
Abstract
Applying the noisy channel model to
search query spelling correction requires
an error model and a language model.
Typically, the error model relies on a
weighted string edit distance measure.
The weights can be learned from pairs
of misspelled words and their corrections.
This paper investigates using the Expec-
tation Maximization algorithm to learn
edit distance weights directly from search
query logs, without relying on a corpus of
paired words.
1 Introduction
There are several sources of error in written lan-
guage. Typing errors can be divided into two
groups (Kucich, 1992): typographic errors and cog-
nitive errors. Typographic errors are the result of
mistyped keys and can be described in terms of key-
board key proximity. Cognitive errors on the other
hand, are caused by a misunderstanding of the cor-
rect spelling of a word. They include phonetic er-
rors, in which similar sounding letter sequences are
substituted for the correct sequence; and homonym
errors, in which a word is substituted for another
word with the same pronunciation but a different
meaning. Spelling errors can also be grouped into
errors that result in another valid word, such as
homonym errors, versus those errors that result in
a non-word. Generally non-word errors are easier to
detect and correct. In addition to its traditional use
in word processing, spelling correction also has ap-
plications in optical character recognition and hand-
writing recognition. Spelling errors in this context
are caused by inaccurate character recognition.
Spelling correction is a well developed research
problem in the field of computational linguistics.
The first dictionary based approach to spelling cor-
rection (
您可能关注的文档
- IceCube-Plus An Ultra-High Energy Neutrino Telescope.pdf
- IcecreamforbreakfastandCokealldaytheextremeeatinghabitsofbillionaires_Hening0523_201605231434.pdf
- ICEPAG2006-24001 MODELING AND CONTROL OF A SOFC-GT HYBRID SYSTEM WITH SINGLE SHAFT CONFIGUR.pdf
- icepak-06-nonconformal.ppt
- ICE超声图像斑点噪声的统计分析.pdf
- IC封装中引起芯片裂纹的主要[....pdf
- IDC点评网:BlueHost主机立减60%25.pptx
- Identification of chicken, duck, pigeon and pig.pdf
- Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance.pdf
- IEEE TRANSACTIONS ON MULTIMEDIA 1 Content-based Copy Retrieval using Distortion-based Proba.pdf
文档评论(0)