Cargando…

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Lingjing, Haiminen, Niina, Carrieri, Anna‐Paola, Huang, Shi, Vázquez‐Baeza, Yoshiki, Parida, Laxmi, Kim, Ho‐Cheol, Swafford, Austin D., Knight, Rob, Natarajan, Loki
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2021
Materias:	Biometric Practice
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787628/ https://www.ncbi.nlm.nih.gov/pubmed/33914902 http://dx.doi.org/10.1111/biom.13481

_version_	1784858557570613248
author	Jiang, Lingjing Haiminen, Niina Carrieri, Anna‐Paola Huang, Shi Vázquez‐Baeza, Yoshiki Parida, Laxmi Kim, Ho‐Cheol Swafford, Austin D. Knight, Rob Natarajan, Loki
author_facet	Jiang, Lingjing Haiminen, Niina Carrieri, Anna‐Paola Huang, Shi Vázquez‐Baeza, Yoshiki Parida, Laxmi Kim, Ho‐Cheol Swafford, Austin D. Knight, Rob Natarajan, Loki
author_sort	Jiang, Lingjing
collection	PubMed
description	Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.
format	Online Article Text
id	pubmed-9787628
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-97876282022-12-28 Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data Jiang, Lingjing Haiminen, Niina Carrieri, Anna‐Paola Huang, Shi Vázquez‐Baeza, Yoshiki Parida, Laxmi Kim, Ho‐Cheol Swafford, Austin D. Knight, Rob Natarajan, Loki Biometrics Biometric Practice Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method. John Wiley and Sons Inc. 2021-05-19 2022-09 /pmc/articles/PMC9787628/ /pubmed/33914902 http://dx.doi.org/10.1111/biom.13481 Text en © 2021 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Biometric Practice Jiang, Lingjing Haiminen, Niina Carrieri, Anna‐Paola Huang, Shi Vázquez‐Baeza, Yoshiki Parida, Laxmi Kim, Ho‐Cheol Swafford, Austin D. Knight, Rob Natarajan, Loki Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
title	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
title_full	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
title_fullStr	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
title_full_unstemmed	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
title_short	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
title_sort	utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
topic	Biometric Practice
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787628/ https://www.ncbi.nlm.nih.gov/pubmed/33914902 http://dx.doi.org/10.1111/biom.13481
work_keys_str_mv	AT jianglingjing utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT haiminenniina utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT carrieriannapaola utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT huangshi utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT vazquezbaezayoshiki utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT paridalaxmi utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT kimhocheol utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT swaffordaustind utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT knightrob utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata AT natarajanloki utilizingstabilitycriteriainchoosingfeatureselectionmethodsyieldsreproducibleresultsinmicrobiomedata

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Ejemplares similares