Cargando…

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is be...

Descripción completa

Detalles Bibliográficos
Autores principales: Bommert, Andrea, Rahnenführer, Jörg, Lang, Michel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5556617/
https://www.ncbi.nlm.nih.gov/pubmed/28835769
http://dx.doi.org/10.1155/2017/7907163
_version_ 1783257098351542272
author Bommert, Andrea
Rahnenführer, Jörg
Lang, Michel
author_facet Bommert, Andrea
Rahnenführer, Jörg
Lang, Michel
author_sort Bommert, Andrea
collection PubMed
description Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.
format Online
Article
Text
id pubmed-5556617
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-55566172017-08-23 A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data Bommert, Andrea Rahnenführer, Jörg Lang, Michel Comput Math Methods Med Research Article Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy. Hindawi 2017 2017-08-01 /pmc/articles/PMC5556617/ /pubmed/28835769 http://dx.doi.org/10.1155/2017/7907163 Text en Copyright © 2017 Andrea Bommert et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bommert, Andrea
Rahnenführer, Jörg
Lang, Michel
A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
title A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
title_full A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
title_fullStr A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
title_full_unstemmed A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
title_short A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data
title_sort multicriteria approach to find predictive and sparse models with stable feature selection for high-dimensional data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5556617/
https://www.ncbi.nlm.nih.gov/pubmed/28835769
http://dx.doi.org/10.1155/2017/7907163
work_keys_str_mv AT bommertandrea amulticriteriaapproachtofindpredictiveandsparsemodelswithstablefeatureselectionforhighdimensionaldata
AT rahnenfuhrerjorg amulticriteriaapproachtofindpredictiveandsparsemodelswithstablefeatureselectionforhighdimensionaldata
AT langmichel amulticriteriaapproachtofindpredictiveandsparsemodelswithstablefeatureselectionforhighdimensionaldata
AT bommertandrea multicriteriaapproachtofindpredictiveandsparsemodelswithstablefeatureselectionforhighdimensionaldata
AT rahnenfuhrerjorg multicriteriaapproachtofindpredictiveandsparsemodelswithstablefeatureselectionforhighdimensionaldata
AT langmichel multicriteriaapproachtofindpredictiveandsparsemodelswithstablefeatureselectionforhighdimensionaldata