中文分词说明书.doc

  1. 1、本文档共33页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
中文分词说明书.doc

( 二 〇 〇 八 年 六 月 摘 要 随着信息的飞速增长,搜索引擎成为了人们查找信息的首选工具,在查询信息过程中,查询内容既包括西文也包括中文,中文与西文不同,西方文字(如英文)的单词间有空格作为分隔,计算机很容易把词分开。而在中文句子里,词和词之间没有明显的分隔符,要把中文句子拆分成词就需要使用中文分词技术。 本设计主要是研究中文分词算法,在计算机专业搜索系统进行应用。系统中的中文分词算法采用机械分词算法,通过和词典的比较,进行把中文词语拆分。 搜索引擎不是对整个查询内容进行匹配查询,而是划分成关键词进行查询。本系统中设计的中文分词算法,主要是采用最大正向分词算法把两字以上的词语拆分出来。这样既可以提高分词的速度,又可以提高搜索的速度和效率。该系统以Java技术为基础,涉及到相关的Struts、Hibernate、JSP等技术。本系统具有良好的可读性、可操作性、可维性、可扩展性和可移植性。 关键词:中文分词;词典;搜索引擎 Abstract With the information rapid growth, the search engine became the people to search the information the first choice tool, in the polling message process, the inquiry content already included the western languages also to include Chinese, Chinese and the western languages is different, Western writing (for example English) between the word had the blank space achievement to separate, the computer was very easy the word to separate. But in Chinese sentence, between the word and the word the obvious separating character, cannot analyze Chinese sentence to use Chinese word segmentation technology. This design mainly studies Chinese word segmentation algorithm, carries on the application in the computer specialized search system. In systems Chinese word segmentation algorithm uses the mechanical participle algorithm, through with the lexicon comparison, carries on Chinese words and expressions resolution. The search engine is not carries on the match to entire inquiry content to inquire, but is divides the key word to carry on the inquiry. In this system designs Chinese word segmentation algorithm, are mainly uses most Taisho to analyze to the participle algorithm two characters above words and expressions? Like this both may enhance the participle the speed, and may enhance the search the speed and the efficiency. This system take the Java technology as a foundation, involves to related technologies and so on Struts, Hibernate, JSP. This system has the good readability, the feasibility,

文档评论(0)

店小二 + 关注
实名认证
内容提供者

包含各种材料

1亿VIP精品文档

相关文档