- 19
- 0
- 约1.41万字
- 约 11页
- 2017-08-20 发布于天津
- 举报
2基于不平衡数据的聚类抽样方法-南京大学学报自然科学
一种基于不平衡数据的聚类抽样方法
朱亚奇1,邓维斌1 ,2*
(1. 重庆邮电大学计算智能重庆市重点实验室,重庆,400065;
2. 西南交通大学信息科学与技术学院,成都,610031 )
摘 要:许多研究表明传统分类器在对海量不平衡数据分类时偏向多数类规则,因此,会导致少数类实例被错误判断为多数类。针对上述问题,提出了一种基于分解求解的学习分类算法。算法先对样本数据进行聚类,在聚类的基础上多次根据权值对数据集进行欠抽样,产生平衡的数据集,对每个平衡数据集进行验证同时提高误判样本的权值。综合考虑每个基分类器的错误率作为分类器的权值,选择分类效果较好的基分类器进行加权集成。实验表明算法有较高的少数类正确率以及少数类F度量,同时可以大幅减少训练集数量。
关键词:机器学习, 不平衡数据, 集成学习, 欠抽样TP391.9 d 文献标识码:A
A method using clustering and sampling approach for imbalance data
Zhu Yaqi 1, Deng Weibin1 ,2*
(1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of
Posts and Telecommunications ,Chongqing, 400065,China;
2. School of Information Science and Technology, Southwest Jiaotong University,
Chengdu, 610031, China)
Abstract: The classification issue is an important research content in the fields of machine learning. The current classification methods have been relatively mature, and generally, using such methods to classify the balanced data can achieve a good effect of classification. But in real world, data proportion is unbalanced in many cases. The traditional classifiers are designed based on the premise of balanced data, and always pursue the best overall accuracy. Therefore, using the traditional classifiers to classify massive unbalanced data will lead to the sharp fall of classifiers’ performance, and the classification result obtained will be greatly biased. It is most commonly seen that the recognition rate of minority samples is far less than that of majority samples. For this reason, the samples which should belong to minority type will be mistakenly classified to majority type. Aimed at the above problem, we can transfer unbalanced datasets to balanced datasets by under-sampling technique, so as to reduce the unbalanced degree of data and allow the traditional classifiers to achieve a good effect when classifying. However, under-sampling will cause the loss of important information, and using clustering algorithm will counteract this loss. Meanwhile, th
您可能关注的文档
最近下载
- 珍惜战友情谊 密切内部关系.docx VIP
- PowerFlex 700变频器用户手册(中文).pdf
- 2025年厦门事业编考试真题及答案 .pdf VIP
- 2025年湖南科技大学中国近现代史纲要期末考试模拟题必考题.docx VIP
- 2025年宁夏大学微生物学专业《微生物学》期末试卷及答案.docx VIP
- 宁夏大学土壤学笔记.docx VIP
- 2025年山东劳动职业技术学院单招语文测试模拟题库附答案.docx VIP
- 2025(人教2019版)化学必修第二册 第五章单元解读课件.pptx
- 厦门市同安区事业单位招聘考试题目及答案2025.docx VIP
- 建筑工程论文8000字.pdf VIP
原创力文档

文档评论(0)