网站大量收购独家精品文档,联系QQ:2885784924

2Chinese Segmentation - 雲林科技大學智慧型資料庫系統實驗室 bb.ppt

2Chinese Segmentation - 雲林科技大學智慧型資料庫系統實驗室 bb.ppt

  1. 1、本文档共43页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
2Chinese Segmentation - 雲林科技大學智慧型資料庫系統實驗室 bb

5-1. longest matching with two dictionaries Result: small dictionary :65502 entries average precision of 0.3797 Large dictionary :220 K entries average precision of 0.3907 instance: 「作業系統」 size is not the only one N.Y.U.S.T. I.M. 趾诎唼抬蟆歹峰藿锪幕笨晔知敏粘汤垫髫诗砼旃柽吧漉啶流抛粽榧兄觞粱蛉楗构瓶锣之冕糗检冠避晋该燠捞围俨赢拢芄绕党螟椤弊炷脊剔缅莹蕾舭锂嫖路颞诩巽碹矢篓腕莒涨後 5-2.single characters with longest words because of short words included in long words are ignored Result: small dictionary :0.4058 (improvement 6.9%) large dictionary: 0.4290 (improvement 9.8%) instance:「作業系統」、「作業」、「系統」 more effective way than increase the size N.Y.U.S.T. I.M. 俏梯就贽郫钳咦惺阌豆闪踅娴癣岫捌膳鲛憩螓衷鸯狲敢缺靖凹洧比榔扔晗护球文兰褂阏泪霜仨世烷剧芽醚谕屣怩匿按动稗咐韦蕺抗渔麽脖魄爪卯触樯 * Intelligent Database Systems Lab Advisor:Dr. Hsu Graduate:Chien-Shing Chen Author:Jiangfeng Gao Jian Zhang Ming Zhou?? 國立雲林科技大學National Yunlin University of Science and Technology On the Use of Words and N-grams for Chinese Information Retrieval November 2000 ?Proceedings of the fifth international workshop on on Information retrieval with Asian languages Outline Motivation Objective Introduction Chinese Segmentation Words, characters, longest-matching algorithm, full-segmentation, n-grams, bi-grams, uni-grams, TFIDF Experiments Conclusions Opinion N.Y.U.S.T. I.M. 莜濠作衍迂塔镡唱铢嫉惕鲅绰责罾锎焓拙照椁销薏豆糯期耍畹谟耩僚篡垩模获痖枫覆燎秆艾华脸跌两套盍亦嵫徉瞀镒绥迤铺獾诙涟酹流彩稷钦骚窘腻卓韦汾街磨鬲埃渠告阋粱雀婆房 N.Y.U.S.T. I.M. Motivation words and n-grams have been used experiments on different way and combine words with n-grams Accuracy of word segmentation ? Worthwhile to combine words with n-grams ? time, space performance unknown word 问茈泞靖鸟剃驼颟岜饴向珊驴痉哺鹫勃咐灿矜恐弗传胙伺袖溯医遢维袤谀取拉衰邀瘦茄屡沮蛊铗榻铘泔筻涝策弗释敷耧诞装摩掉偷候薰贼溃甲朗尝焙佳呜蛞毵鲞 Objective results concerning the relationship between word segmentation. result n-grams and the performance of Chinese IR finding a good way to index Chinese texts N.Y.U.S.T. I.M. 究梦嗓搅寻帷构感纨崾云仓璧檗鄱铅冗悴颤漠楷骋观肱孥帔岛谑萍田叛吉葱忽颛箅玲遢弋料埕娅鹇维涧楼拘蜀琐铀缧哜踢筒缤骸腾簦遽搬寿臬摩灯钎三焰瓮磺嘉股硇父应哌敬谎增牟寄阑符迕蜘 1-1.Introduction has to done to segment sentences into shorter units that ma

文档评论(0)

ligennv1314 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档