金融领域论坛搜索及观点判别-计算机科学与技术专业论文.docxVIP

  • 4
  • 0
  • 约4.74万字
  • 约 50页
  • 2019-02-22 发布于上海
  • 举报

金融领域论坛搜索及观点判别-计算机科学与技术专业论文.docx

哈尔滨工业大学工学硕士学位论文 哈尔滨工业大学工学硕士学位论文 II II Abstract In recent years, with the rapid development of web2.0, the Internet has been expanded into a huge amount of data and content-rich information carriers. Emergence of some new form of knowledge services, which has strong interaction with user, typically like cyclopedia knowledge, personal blog, forums, etc. among the online services, the forum allow the users to raise and discuss issues, share information, post freely and simply, so the forum has a high timeliness and is accepted by the majority of the users. How to make full use of the data in the financial sector forum, organize and mine useful information of the massive data, in order to provide access to the user is the main content of the paper. This paper mainly includes two aspects below: Firstly, set up the forum vertical search engine. According to processes of the general search engine, the system completes the spider modules, web data extraction and indexing modules and the query sorting module in turn. According to the financial forum vertical search engine, the implementation of each part has its own characteristics. For example, in the spider module, the crawling strategy is given the daily hot stocks more frequency of crawling to improve the timeliness overall the system. In the query sorting module, the system not only provides the relevant sorting like the general search engine, but also provides sorting according to the number of hits, replies and timeliness of the post. Secondly, the paper turns to financial forum data mining to provide a more humane, intelligent service to the users. The main work includes test classification for the posts in the forum. After the word segmentation and text feature extraction, we use na?ve Bayesian algorithm to classify the data. Next improved Bayesian algorithm is proposed based one motional dictionary. The paper uses the knowledge of hownet database to effectively improve the performance and accuracy of classificat

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档