An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora.pdf
- 1、本文档共9页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora
EAMT 2005 Conference Proceedings 1
An Efficient Phrase-to-Phrase Alignment Model for Arbitrarily
Long Phrase and Large Corpora
Ying Zhang Stephan Vogel
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
{joy+,vogel+}@
Abstract. Most statistical machine translation (SMT) systems use phrase-to-phrase
translations to capture local context information, leading to better lexical choices and more
reliable word reordering. Long phrases capture more contexts than short phrases and result
in better translation qualities. On the other hand, the increasing amount of bilingual data
poses serious problems for storing all possible phrases. In this paper, we describe a novel
phrase-to-phrase alignment model which allows for arbitrarily long phrases and works for
very large bilingual corpora. This model is very efficient in both time and space and the
resulting translations are better than the state-of-the-art systems.
1. Introduction
In recent years, various phrase-to-phrase
translation models (Och 1999; Marcu Wong
2002; Koehn 2003; Zhang 2003) have shown
great advantages over the word-based systems
(Brown 1990). We believe that longer phrases
encapsulate more contexts of the words and the
translation qualities are expected to be higher
than that of short phrases. Unfortunately, given
the increasing volume of the parallel bilingual
data for some major languages such as Arabic
and Chinese, storing and loading all possible
phrase translations from the training corpus
becomes more and more expensive by means of
space and time in computation. To keep the
phrasal translation model of a reasonable size,
some models (Koehn 2003) and (Zhang 2003)
limit the length of the phrases to be no more
than 3 words while others (Vogel 2003) sub-
samples the training corpus based on the testing
data to down-scale the problem. In this paper,
we introduce a new strategy to cope with this
problem. Instead of aligning the phr
您可能关注的文档
- A Measurement of $R ={sigma_L}{sigma_T}$ in Deep Inelastic Neutrino-Nucleon Scattering at t.pdf
- A method to compute the migration rate of planar solid–liquid interfaces in binary alloys.pdf
- A Methodological View on Knowledge-Intensive Subgroup Discovery.pdf
- A minimal model with large extra dimensions to fit the neutrino data.pdf
- A Minimalist Turbulent Boundary Layer Model.pdf
- A modelling system for predicting air pollution_ comparison of model predictions with the data of an.pdf
- A modular concept of plant foraging behaviour- the interplay between local responses and systemic.pdf
- A More Comprehensive Approach to Enhancing Business Process Efficiency.pdf
- A Modified Fast Marriage in Honey Bee Opjtimization Algorithm.pdf
- A method for cartoonstyle rendering of liquid animations.pdf
文档评论(0)