《文本挖掘中文本特征表示和文本特征选择研究与实现》-毕业论文（设计）.doc

下载文档

47
0
约1.77万字
约 30页
2018-12-03 发布于广西
举报
版权申诉
保障服务

《文本挖掘中文本特征表示和文本特征选择研究与实现》-毕业论文（设计）.doc

1、本文档共30页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

精品 PAGE 精品吉首大学 JISHOU UNIVERSITY 毕业设计（论文）题目：文本挖掘中文本特征表示与文本特征选择研究与实现学院数计学院指导教师专业学生阮程宣。学号。精品文本挖掘中文本特征表示和文本特征选择研究与实现摘要：文本挖掘，又称为文本数据挖掘或文本知识发现，是指在大规模的文本中发现隐含的、以前未知的、潜在有用的模式的过程。本文首先对文本挖掘进行概述，给出文本挖掘的定义和研究现状。然后对文本挖掘中文本特征表示和文本选择方法进行叙述，本文介绍了几种文本挖掘中文本特征表示和文本特征选择常用算法，通过比较本文主要选择TFIDF算法对完成文本特征表示和特征选择，其中TFIDF算法因其算法相对简单、并有较高的准确率，一直受到相关研究人员和众多应用领域的青睐。由于本文主要进行的是文本的特征表示和特征选择，所以没有对分词进行研究，对于中文文本中的词的问题我们采取对将要挖掘的文档进行手动分词，通过TFIDF对选定文档的词计算出这些特征项在文本中的权值，同时这些特征项也可以转化为结构化的形式数据保存，作为文本的中间表现形式，然后在算法中定义一定的取值范围作为特征选择，实现挖掘出文本关键信息的目的。本文通过程序实现TFIDF算法计算特征权重得出文本特征项和文本关键信息，对所选课题进行一个应用性模拟。关键字：文本挖掘，特征表示，特征选择，空间向量模型，TFIDF。 This text mining text characteristics and text feature selection and implementation Abstract: text mining, also called text data mining or text knowledge discovery, refers to the mass of the text of the implied that previously unknown, and potentially useful mode process. This paper Outlines of text mining, gives the definition of text mining and research status. Then the characteristics of text mining Chinese text selection method and narration, this paper introduces some characteristics of the Chinese text mining and text feature selection methods, through the comparison of the main selection algorithm of the complete text features TFIDF feature selection, and the algorithm for TFIDF algorithm is relatively simple, and the high accuracy, have been related researchers and numerous applications. Because this major is characteristic of text representation and feature selection, so no word to study for Chinese text, the word of the problem, we will take the document for manual excavation, through the word for the selected document TFIDF calculated these words in the text feature weights, at the same time, these feature can also into a structured form data storage, as in the middle of the text, and then in the form of algorithm definition as feature selection scope, u