- 1、本文档共7页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Developments Towards an Electronic Amharic Corpus
TALN 2005, Dourdan, 6-10 juin 2005
TALN 2005
Developments Towards an Electronic Amharic Corpus
Daniel Yacob
Ge’ez Frontier Foundation
7802 Solomon Seal Dr, Springfield, VA 22152, USA
yacob@
Abstract
The state of Amharic natural language processing was aptly assessed at TALN 2003
by Atelach, Asker and Mesfin. A public Amharic corpus and a comprehensive
lexicon were two of the most needed items in absence for Amharic language
researchers. Since the 2003 assessment some progress has been made in these two
areas and researchers have begun informal collaboration to address the common goal
of developing these public resources. In this same period Ethiopia’s legal system has
changed to cloud the issue over what the legal status of an Amharic corpus would be.
While a promising start is underway, corpus developers and researchers alike will
have to familiarize themselves with the new legislature in Ethiopia and reexamine the
status of their holding to avoid potential unintended violations.
1 Introduction
Amharic is the most studied and best understood language of Ethiopia, it also serves
as the country’s lingua franca. Researchers today, both inside and outside of Ethiopia,
are increasingly interested in computational investigations of the Amharic language.
The lack of a freely available electronic corpora, lexicon, and transcription standard,
coupled with the complexities of Amharic orthography are a significant barrier to
would be researchers.
Amharic, along with its ten sibling Ethio-Semitic languages found in Ethiopia
and neighboring Eritrea, is written in the Ethiopic syllabary. Amharic has been a
written language for roughly 600 years and has as rich legacy of both typeset and
calligraphic literature. Significant amounts of electronic corpora in
您可能关注的文档
- 钛白粉(TiO2)的制备.pdf
- 钠基膨润土对硅溶胶涂料悬浮性的影响.pdf
- 钟花樱组织培养中多因子正交试验研究.pdf
- 金洲商品混凝土有限公司实验室去掉河砂配制混凝土实验.doc
- 钢-砼组合箱梁施工技术_pdf.pdf
- 钢丝热处理酸洗涂层连续作业线.pdf
- 钢_连续纤维复合筋_SFCB_单向拉伸力学性能试验研究.pdf
- 钟山乡蜜梨.ppt
- 钢丝网砼墙板结构分析和设计.pdf
- 钢号知识.doc
- 山东省乐陵市七年级生物上册 1.1.4 生物学的研究工具教案 (新版)济南版.docx
- 写作:论证要合理教案 人教版.docx
- 登飞来峰教案 人教版.docx
- 多组分空气说课稿 人教版.docx
- 归纳推理教学设计 人教版.docx
- 种树郭橐驼传教案 人教版.docx
- 第5课 蔬果的联想 (教案)湘美版 美术四年级下册.docx
- 人美版(常锐伦、欧京海主编) 六年级下册 美术 第14课 留给母校的纪念 教案.docx
- 战后资本主义世界经济体系的形成教案 人教版.docx
- 八年级英语上册 Unit10 If you go to the party Section A(1a-1c)教案 (新版)人教新目标版.docx
文档评论(0)