面向情感分类的领域适应方法研究-计算机应用技术专业毕业论文.docxVIP

下载本文档

7
0
约5.59万字
约 52页
2019-05-11 发布于上海
举报

面向情感分类的领域适应方法研究-计算机应用技术专业毕业论文.docx

I I 面向情感分类的领域适应方法研究摘要随着博客，商品评论等信息在网络上的涌现，情感分类日益成为一个重要且富有挑战性的课题。情感分类试图根据文本信息，自动评判用户所表达的情感极性（如正面或负面），在电子商务和舆情分析等领域展现出越来越重要的作用。然而，在情感分类领域中，用户表达情感方式多种多样，领域间数据分布也存在明显差异，情感分类的准确率极易受到数据所在领域的限制和影响。对于新领域的情感分类问题，传统的机器学习方法只能通过重新标记训练数据完成学习建模，这通常需要消耗大量的人力物力。为此，我们分别从构建领域间统一的特征空间和集成分类两个方面，展开面向情感分类的领域适应方法研究，提出了基于对数似然比的特征选择算法 LTF 和基于置信概率的协同学习集成决策算法 CEC。主要工作如下： (1) 本文提出的面向多领域的情感分类特征选择方法 LTF(log-likelihood ratio term frequency)，综合利用了原始领域和目标领域数据，使用词频和对数似然比的统计信息，选取在原始领域富有极性，且在目标领域有较大影响的特征，构建原始领域和目标领域公共特征空间，消减了原始领域和目标领域的数据分布差异，促进了知识的跨领域迁移。 (2) 在集成分类器方面，本文提出了一种基于置信概率的多领域集成算法 CEC(Confident Ensemble Classifier)。该方面借鉴自学习和协同学习的思想，利用置信概率，进行数据的预标记的同时，完成各个基分类器的集成，从而有效提升目标领域的分类精度。通过在情感数据集上的大量实验表明 CEC 算法确实提高了目标领域的分类准确率。关键词：数据挖掘；机器学习；情感分类；领域适应 II II Domain Adaptation of Sentiment Classification ABSTRACT As the blogs, product reviews spring up, sentiment classification has become a challenging problem. Sentiment classification, which aims to identify ones ’ sentimental polarities, is playing an increasing important role i n E-commerce and public opinion analysis. However, the way of expressing sentiment varies a lot and the data distributions differ in multiple domains. So sentiment classification tends to be influenced by different domains. To solve a sentiment classification problem of a new domain, traditional machine learning methods need to label new training data, which costs a lot of manpower and material resource. Thus, we propose two methods, LLR based feature selection method and confidence probability based ensemble method, to implement sentiment domain adaptation from aspects of feature space and ensemble strategy. A novel feature selection method, named LTF, is proposed. This method creates a common feature space for both source do

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

面向情感分类的领域适应方法研究-计算机应用技术专业毕业论文.docxVIP