基于web的文本分类算法分析及系统实现-analysis and system implementation of text classification algorithm based on web.docxVIP

  • 8
  • 0
  • 约3.46万字
  • 约 51页
  • 2018-05-18 发布于上海
  • 举报

基于web的文本分类算法分析及系统实现-analysis and system implementation of text classification algorithm based on web.docx

基于web的文本分类算法分析及系统实现-analysis and system implementation of text classification algorithm based on web

AbstractIn recent years, with the rapid development of Internet,the data volume of electronic text presented by Web has a geometric rate of expansion. How to effectively organize and manage these data and to send the required information to user comprehensively, accurately and quickly is an important challenge for the current research of InformationTechnology. The problem of finding information accurately and quickly from messy data can be well solved through Text Categorization. Automatic text categorization is an important technology for organizing and processing large quantity of text information.Earlier text categorization was only based on pure text, with the growing popularity of the Internet and the rapid development of Web technology, more and more digital information is presented in the form of web pages .Web is becoming the most important channel for users to get information. How to find useful information quickly from distributed, heterogeneous, semi-structured web environment, and extract knowledge from the web pages become the core issue in Data Mining and Knowledge Management.The implementation of a Web-based text categorization system is discussed in this thesis, including two parts, extracting text from web pages and text categorization.In this thesis we first describe the latest research of the automatic text categorization both at home and abroad. And then make an in-depth discussion and propose solutions for text collection and text categorization. Methods for page analysis and URL reduction are given for the reptiles and a mask based text extraction method is proposed. Also show the ways for solving problems on words segmentation, feature extraction and text categorization.The system prototype for TCViewer (Text Categorization Viewer) is the main achievement in our research. And at the end of this paper we carry out two experiments on text collection and text categorization to verify the effectiveness of the system.Key Words: Text Collection, Tex

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档