基於文本概念和kNN 的跨语种文本过滤Cross-Language Text.PDF

下载文档 降价啦

0
0
约1.9万字
约 12页
2017-08-05 发布于天津
举报
版权申诉
保障服务

基於文本概念和kNN 的跨语种文本过滤Cross-Language Text.PDF

1、本文档共12页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

基於文本概念和kNN 的跨语种文本过滤Cross-Language Text

Computational Linguistics and Chinese Language Processing Vol. 7, No. 1, February 2002, pp. 79-90 79 © The Association for Computational Linguistics and Chinese Language Processing. 基於文本概念和 kNN 的跨語種文本過濾 Cross-Language Text Filtering Based on Text Concepts and kNN * * * * 蘇偉峰 , 李紹滋 , 李堂秋 , 尤文建 Weifeng Su, Shaozi Li, Tanqiu Li, Wenjian You 摘要本文介紹一個可以從中文或英文大量的資訊中過濾出用戶的興趣所在的文檔的模型，用一簇可分義原向量空間的向量來表示用戶所感興趣的文本，然後把需要處理的文本也表示成一個可分義原空間中的一個向量，在向量空間中與 k 個最相近的向量進行計算，從而決定是否將該文本呈現給用戶。實驗證明，這是一個比較好的過濾方法。關鍵字：可分義原、向量空間、kNN 、文本表示、知網 Abstract The WWW is increasingly being used source of information. The volume of information is accessed by users using direct manipulation tools. It is obviously that we’d like to have a tool to keep those texts we want and remove those texts we don’t want from so much information flow to us. This paper describes a module that sifts through large number of texts retrieved by the user. The module is based on HowNet, a knowledge dictionary developed by Mr. Zhendong Dong. In this dictionary, the concept of a word is divided into sememes. In the philosophy of HowNet, all concepts in the world can be expressed by a combination more than 1500 sememes. Sememe is a very useful concept in settle the problem of synonym which is the most difficult problem in text filtering. We classified the set of sememes into two sets of sememes: classfiable sememes and unclassficable semems. Classfia