PDF300KB-南京大学学报（自然科学版）.doc

下载文档 降价啦

7
0
约1.43万字
约 11页
2017-04-20 发布于天津
举报
版权申诉
保障服务

PDF300KB-南京大学学报（自然科学版）.doc

1、本文档共11页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

基金项目：国家自然科学基金，重庆市自然科学基金(cstc2012jjA40032, cstc2013jcyjA40063)，重庆市/信息产业部计算机网络与通信技术重点实验室开放基金(CY-CNCL-2010-05) 收稿日期： *通讯联系人，E-mail：dengwb@ 一种基于不平衡数据的聚类抽样方法(CRSSC2014)会议论文朱亚奇1,邓维斌1 ,2* (1. 重庆邮电大学计算智能重庆市重点实验室，重庆，400065； 2. 西南交通大学信息科学与技术学院，成都，610031 ) 摘要：许多研究表明传统分类器在对海量不平衡数据分类时偏向多数类规则，因此，会导致少数类实例被错误判断为多数类。针对上述问题，提出了一种基于分解求解的学习分类算法。算法先对样本数据进行聚类，在聚类的基础上多次根据权值对数据集进行欠抽样，产生平衡的数据集，对每个平衡数据集进行验证同时提高误判样本的权值。综合考虑每个基分类器的错误率作为分类器的权值，??择分类效果较好的基分类器进行加权集成。实验表明算法有较高的少数类正确率以及少数类F度量，同时可以大幅减少训练集数量。关键词：机器学习, 不平衡数据, 集成学习, 欠抽样中图分类号：TP391.9 d 文献标识码：A A method using clustering and sampling approach for imbalance data Zhu Yaqi 1, Deng Weibin1 ,2* (1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications ,Chongqing, 400065,China; 2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu, 610031, China) Abstract: The classification issue is an important research content in the fields of machine learning. The current classification methods have been relatively mature, and generally, using such methods to classify the balanced data can achieve a good effect of classification. But in real world, data proportion is unbalanced in many cases. The traditional classifiers are designed based on the premise of balanced data, and always pursue the best overall accuracy. Therefore, using the traditional classifiers to classify massive unbalanced data will lead to the sharp fall of classifiers’ performance, and the classification result obtained will be greatly biased. It is most commonly seen that the recognition rate of minority samples is far less than that of majority samples. For this reason, the samples which should belong to minority type will be mistakenly classified to majority type. Aimed at the above problem, we can transfer unbalanced datasets to balanced datasets by under-sampling technique, so as to reduce the unbalanced degree of data and allow the traditional classifiers to a