- 7
- 0
- 约5.52万字
- 约 14页
- 2017-07-05 发布于湖北
- 举报
Knowl Inf Syst (2011) 26:487–500
DOI 10.1007/s10115-010-0288-x
REGULAR PAPER
A two-stage gene selection scheme utilizing MRMR filter
and GA wrapper
Ali El Akadi · Aouatif Amine ·
Abdeljalil El Ouardighi · Driss Aboutajdine
Received: 17 February 2009 / Revised: 4 January 2010 / Accepted: 16 January 2010 /
Published online: 10 March 2010
© Springer-Verlag London Limited 2010
Abstract Gene expression data usually contain a large number of genes, but a small num-
ber of samples. Feature selection for gene expression data aims at finding a set of genes that
best discriminates biological samples of different types. In this paper, we propose a two-stage
selection algorithm for genomic data by combining MRMR (Minimum Redundancy–Max-
imum Relevance) and GA (Genetic Algorithm). In the first stage, MRMR is used to filter
noisy and redundant genes in high-dimensional microarray data. In the second stage, the GA
uses the classifier accuracy as a fitness function to select the highly discriminating genes. The
proposed method is tested for tumor classification on five open datasets: NCI, Lymphoma,
Lung, Leukemia and Colon using Support Vector Machine (SVM) and Naïve Bayes (NB)
classifiers. The comparison of the MRMR-GA with MRMR filter and GA wrapper shows
that our method is able to find the smallest gene subset that gives the most classification
accuracy in leave-one-out cross-validation (LOOCV).
Keywords Feature selection · Genetic algorithm · MRMR · Support Vector Machine ·
Naïve Bayes classifier · LOOCV
1 Introduction
In recent years, the development of microarray technology has made it possible to analyze
thousands or tens of thousands of genes simultaneously. However, the major problem in this
analysis is the huge number of genes compared to the limited number of samples [35]. Most
classification algorithms suffer from such a high-dimensional input space. Furthermore, most
of the genes in arrays are irrelevant or redundant to some specifie
您可能关注的文档
最近下载
- 核医学教学课件:血液和淋巴显像.ppt VIP
- 重庆市各地方周氏支族源流(1-170支族).doc VIP
- 建筑工程质量管理体系流程图.docx
- 淋巴系统核医学检查课件.ppt VIP
- (高清版)B-T 6003.1-2022 试验筛 技术要求和检验 第1部分:金属丝编织网试验筛.pdf VIP
- (已压缩)TUCST007-2020房屋建筑与市政基础设施工程施工安全风险评估技术标准.docx VIP
- 2026年国家公务员考试申论真题及参考答案(考生回忆版).docx VIP
- 部编版语文二年级上册期中常考七大重点题型专项训练.docx VIP
- 机器人操作系统(ROS)及仿真应用 课件全套 第1--9章 Linux Ubuntu入门基础--- 基于ROS的服务机器人应用实例.ppt
- 压疮的预防及护理技术操作考核评分标准编辑.docx VIP
原创力文档

文档评论(0)