Cargando…

Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery

BACKGROUND: Lately, biomarker discovery has become one of the most significant research issues in the biomedical field. Owing to the presence of high-throughput technologies, genomic data, such as microarray data and RNA-seq, have become widely available. Many kinds of feature selection techniques h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Moon, Myungjin, Nakai, Kenta
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260053/ https://www.ncbi.nlm.nih.gov/pubmed/28155664 http://dx.doi.org/10.1186/s12864-016-3320-z

_version_	1782499332995416064
author	Moon, Myungjin Nakai, Kenta
author_facet	Moon, Myungjin Nakai, Kenta
author_sort	Moon, Myungjin
collection	PubMed
description	BACKGROUND: Lately, biomarker discovery has become one of the most significant research issues in the biomedical field. Owing to the presence of high-throughput technologies, genomic data, such as microarray data and RNA-seq, have become widely available. Many kinds of feature selection techniques have been applied to retrieve significant biomarkers from these kinds of data. However, they tend to be noisy with high-dimensional features and consist of a small number of samples; thus, conventional feature selection approaches might be problematic in terms of reproducibility. RESULTS: In this article, we propose a stable feature selection method for high-dimensional datasets. We apply an ensemble L (1)-norm support vector machine to efficiently reduce irrelevant features, considering the stability of features. We define the stability score for each feature by aggregating the ensemble results, and utilize backward feature elimination on a purified feature set based on this score; therefore, it is possible to acquire an optimal set of features for performance without the need to set a specific threshold. The proposed methodology is evaluated by classifying the binary stage of renal clear cell carcinoma with RNA-seq data. CONCLUSION: A comparison with established algorithms, i.e., a fast correlation-based filter, random forest, and an ensemble version of an L (2)-norm support vector machine-based recursive feature elimination, enabled us to prove the superior performance of our method in terms of classification as well as stability in general. It is also shown that the proposed approach performs moderately on high-dimensional datasets consisting of a very large number of features and a smaller number of samples. The proposed approach is expected to be applicable to many other researches aimed at biomarker discovery.
format	Online Article Text
id	pubmed-5260053
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-52600532017-01-26 Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery Moon, Myungjin Nakai, Kenta BMC Genomics Research BACKGROUND: Lately, biomarker discovery has become one of the most significant research issues in the biomedical field. Owing to the presence of high-throughput technologies, genomic data, such as microarray data and RNA-seq, have become widely available. Many kinds of feature selection techniques have been applied to retrieve significant biomarkers from these kinds of data. However, they tend to be noisy with high-dimensional features and consist of a small number of samples; thus, conventional feature selection approaches might be problematic in terms of reproducibility. RESULTS: In this article, we propose a stable feature selection method for high-dimensional datasets. We apply an ensemble L (1)-norm support vector machine to efficiently reduce irrelevant features, considering the stability of features. We define the stability score for each feature by aggregating the ensemble results, and utilize backward feature elimination on a purified feature set based on this score; therefore, it is possible to acquire an optimal set of features for performance without the need to set a specific threshold. The proposed methodology is evaluated by classifying the binary stage of renal clear cell carcinoma with RNA-seq data. CONCLUSION: A comparison with established algorithms, i.e., a fast correlation-based filter, random forest, and an ensemble version of an L (2)-norm support vector machine-based recursive feature elimination, enabled us to prove the superior performance of our method in terms of classification as well as stability in general. It is also shown that the proposed approach performs moderately on high-dimensional datasets consisting of a very large number of features and a smaller number of samples. The proposed approach is expected to be applicable to many other researches aimed at biomarker discovery. BioMed Central 2016-12-22 /pmc/articles/PMC5260053/ /pubmed/28155664 http://dx.doi.org/10.1186/s12864-016-3320-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Moon, Myungjin Nakai, Kenta Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery
title	Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery
title_full	Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery
title_fullStr	Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery
title_full_unstemmed	Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery
title_short	Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery
title_sort	stable feature selection based on the ensemble l(1)-norm support vector machine for biomarker discovery
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260053/ https://www.ncbi.nlm.nih.gov/pubmed/28155664 http://dx.doi.org/10.1186/s12864-016-3320-z
work_keys_str_mv	AT moonmyungjin stablefeatureselectionbasedontheensemblel1normsupportvectormachineforbiomarkerdiscovery AT nakaikenta stablefeatureselectionbasedontheensemblel1normsupportvectormachineforbiomarkerdiscovery

Stable feature selection based on the ensemble L(1)-norm support vector machine for biomarker discovery

Ejemplares similares