A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction-使用最近邻进行跨项目缺陷预测的混合实例选择.pdfVIP

  • 32
  • 0
  • 约8.06万字
  • 约 12页
  • 2017-10-15 发布于上海
  • 举报

A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction-使用最近邻进行跨项目缺陷预测的混合实例选择.pdf

A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction-使用最近邻进行跨项目缺陷预测的混合实例选择

Ryu D, Jang JI, Baik J. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 30(5): 969–980 Sept. 2015. DOI 10.1007/s11390-015-1575-5 A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction Duksan Ryu, Jong-In Jang, and Jongmoon Baik, Member, ACM, IEEE School of Computing, Korea Advanced Institute of Science and Technology, Yuseong-gu, Daejeon 305-701, Korea E-mail: {dsryu, forestar0719, jbaik}@kaist.ac.kr Received March 20, 2015; revised July 7, 2015. Abstract Software defect prediction (SDP) is an active research field in software engineering to identify defect-prone modules. Thanks to SDP, limited testing resources can be e昇ectively allocated to defect-prone modules. Although SDP requires su昋cient local data within a company, there are cases where local data are not available, e.g., pilot projects. Companies without local data can employ cross-project defect prediction (CPDP) using external data to build classifiers. The major challenge of CPDP is di昇erent distributions between training and test data. To tackle this, instances of source data similar to target data are selected to build classifiers. Software datasets have a class imbalance problem meaning the ratio of defective class to clean class is far low. It usually lowers the performance of classifiers. We propose a Hybrid Instance Selection Using Nearest-Neighbor (HISNN) method that performs a hybrid classification selectively learning local knowledge (via k-nearest neighbor) and global knowledge (via na¨ıve Bayes). Instances having strong local knowledge are identified via nearest-neighbors with the same class label. Previous studies showed low PD (probability of detection) or high PF (p

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档