Cargando…

The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures

Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feat...

Descripción completa

Detalles Bibliográficos
Autores principales: Haury, Anne-Claire, Gestraud, Pierre, Vert, Jean-Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244389/
https://www.ncbi.nlm.nih.gov/pubmed/22205940
http://dx.doi.org/10.1371/journal.pone.0028210
_version_ 1782219721134833664
author Haury, Anne-Claire
Gestraud, Pierre
Vert, Jean-Philippe
author_facet Haury, Anne-Claire
Gestraud, Pierre
Vert, Jean-Philippe
author_sort Haury, Anne-Claire
collection PubMed
description Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. In this study we compare [Image: see text] feature selection methods on [Image: see text] public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded methods generally do not outperform simple univariate feature selection methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results.
format Online
Article
Text
id pubmed-3244389
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32443892011-12-28 The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures Haury, Anne-Claire Gestraud, Pierre Vert, Jean-Philippe PLoS One Research Article Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. In this study we compare [Image: see text] feature selection methods on [Image: see text] public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Surprisingly, complex wrapper and embedded methods generally do not outperform simple univariate feature selection methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Public Library of Science 2011-12-21 /pmc/articles/PMC3244389/ /pubmed/22205940 http://dx.doi.org/10.1371/journal.pone.0028210 Text en Haury et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Haury, Anne-Claire
Gestraud, Pierre
Vert, Jean-Philippe
The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
title The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
title_full The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
title_fullStr The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
title_full_unstemmed The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
title_short The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures
title_sort influence of feature selection methods on accuracy, stability and interpretability of molecular signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244389/
https://www.ncbi.nlm.nih.gov/pubmed/22205940
http://dx.doi.org/10.1371/journal.pone.0028210
work_keys_str_mv AT hauryanneclaire theinfluenceoffeatureselectionmethodsonaccuracystabilityandinterpretabilityofmolecularsignatures
AT gestraudpierre theinfluenceoffeatureselectionmethodsonaccuracystabilityandinterpretabilityofmolecularsignatures
AT vertjeanphilippe theinfluenceoffeatureselectionmethodsonaccuracystabilityandinterpretabilityofmolecularsignatures
AT hauryanneclaire influenceoffeatureselectionmethodsonaccuracystabilityandinterpretabilityofmolecularsignatures
AT gestraudpierre influenceoffeatureselectionmethodsonaccuracystabilityandinterpretabilityofmolecularsignatures
AT vertjeanphilippe influenceoffeatureselectionmethodsonaccuracystabilityandinterpretabilityofmolecularsignatures