国家语料库建设和汉语词表研究-计算机软件与理论专业论文.docxVIP

下载本文档

2
0
约8.26万字
约 85页
2019-04-12 发布于上海
举报
版权申诉

国家语料库建设和汉语词表研究-计算机软件与理论专业论文.docx

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

摘摘要本文首先潮顾了语料库建设及语料瘁语言学的历史，介缁了国内外语料库建设的现状，着黧介绍了国家语料库建设的基本情况，因为本文的工馋就是焱参与潮家语料库建设课题的研究中完成的，这是本文开展研究的背景。其次介绍了本人在站两年的主要研究工作——语料库的加工规范及词表的鳐构纯夯辑工。邋两顼研究工{乍都是在联系簿筛麓汝占教授的汉语逻辫语义穰鍪理论指导下进行的，与传统的汉语研究以及一般的中文信息处理研究都很不相同。在滔表黪结掩纯掇工方瑟，挺密了缭稳纯谲表豹概念莠显开袋了工糕研究，对运 9万词的词表谶行了3次结构化加工，其结果已经运用在国家语料滕的加工实践中，成效显著；在鸯曩王援范穷瑟，在结橡化调表熬基磷上提爨了结稳伲掭注鹁方法，使得加工结果能够适应不同应用系统的需求。而且由于缩构化词表标注了许多词法傣息，这藏使褥切词标注软件的设计修改与切词标注艘范之阙、_}霹袭与切谰标注软件以及语料校对与最后输出结果之间都能保持相对的独立性，既减少了人工校对的工作量，又保证了语料麾加工的一致性与正确率。最后是未来工作的设想，提出了今瑶努力的其体方向。关键词：潺辩瘁，蕊王麓范，结褥鬣丽表，良透滋辑语义模型 AbStraCt this Papelfirstly retrospects the history of corpus construction and COrl：}US linguistics．introduces the of COrpUS constructions home and abroad．The Paper focrises introducing the basic situation of National Corpus construction， since血e work this PaDer iS accomplished through the research of the task of National Corpus construction．This is the whole background this paper studies in。 Secondly,this Paper introduces the main postdoctora{research WOrk I have done during the two years：The Processing Criterion of Corpus and the Structural Processing of Word-stoc羹These two iterns was under the instruction of Prof．己0S Chinese Semantic Logic Model Thoery,SO it differs from the traditional Chinese study and general Chinese狐formafion Processing。In the aspect of the structural processing of Word-stock we brought forward the conception of structuralized 弭白rd．stock and started research from蜘e conception÷鹏have processed approximately 90000．word-Word-stock for three times using the structural way．And its result was used in the practice of恤e processing of national and got good effect．In the aspcct of Processing Criterion，we brouI曲t forward the method of stmctural tagging based the idea of stmcturalizcd Wbrd-stock,in order to make the processing result to fit the needs of different applications．Since we saved lot of loxical information in the structuralized Wo婚stock we make the criterion of tagging independent of the design and implementation of tagging software， wor