基于语料统计地以-不-开头双字分词不一致研究.pdfVIP

下载本文档

6
0
约9.64千字
约 6页
2017-08-19 发布于安徽
举报

基于语料统计地以-不-开头双字分词不一致研究.pdf

基于语料统计的以“不开头双字分词不一致研究程月季娜洪鹿平 (南京师范大学文学院，南京210046) 摘要：太规模语料库中分词不一致现象普遍存在，并影响语料库的建设质量。在对熟语料进行分析统计的基础上，着重研究以“不一开头的双字结构，深入分析该结构分词不一致的产生原因。从全新的角度以集合的概念进行详细分类t并得出造成组台型歧义和分词变异的一系列原因。关键词：分词不一致：“不”开头的双字；纽台型歧义；分词变异 Based on I of CorpusStudySegment Words Two—characterChinese with”不 Starting Yue，Ji Luping Cheng Na，Hong and Literature,N肌jingNormalUniversity,Nanjing210046) (SchoolofChineseLanguage the establishment．wc isuniversalin affects Abstract：The inconsistency largo—scalecoIpus,andqualityofcorpus phenomenonofsegment Chinesewords with“不”after and the liedour Olltheslnlcnl佗oftwo-character calculatinganalyzingprocessedcorpus emphases starting set to structurewasclassifiedindetail the statistic．Thenwc thereasonsthatledthe theory,and analyzed segmentinconsistency．The using seriesofreasonsthat Combinatorialand variation． we 8 ambiguitysegment acquired producing words variation Chinese with“不”；Combinatorial inconsistency；two-characterstarting ambiguity；segment Keywords：segment 引言分词是汉语自动分析中必不可少的第一道工序，分词不一致问题是自动分词中面临的一大难题，直接关系到语料库的建设。1988年国家审定颁布的《信息处理用现代汉语分词规范(国家标准)》(以下简称《规范》)从信息处理的实际要求出发，根

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

基于语料统计地以-不-开头双字分词不一致研究.pdfVIP