Cargando…

Variable selection in microbiome compositional data analysis

Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampe...

Descripción completa

Detalles Bibliográficos
Autores principales: Susin, Antoni, Wang, Yiwen, Lê Cao, Kim-Anh, Calle, M Luz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671404/
https://www.ncbi.nlm.nih.gov/pubmed/33575585
http://dx.doi.org/10.1093/nargab/lqaa029
_version_ 1783610922576642048
author Susin, Antoni
Wang, Yiwen
Lê Cao, Kim-Anh
Calle, M Luz
author_facet Susin, Antoni
Wang, Yiwen
Lê Cao, Kim-Anh
Calle, M Luz
author_sort Susin, Antoni
collection PubMed
description Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.
format Online
Article
Text
id pubmed-7671404
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76714042021-02-10 Variable selection in microbiome compositional data analysis Susin, Antoni Wang, Yiwen Lê Cao, Kim-Anh Calle, M Luz NAR Genom Bioinform Methart Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies. Oxford University Press 2020-05-13 /pmc/articles/PMC7671404/ /pubmed/33575585 http://dx.doi.org/10.1093/nargab/lqaa029 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methart
Susin, Antoni
Wang, Yiwen
Lê Cao, Kim-Anh
Calle, M Luz
Variable selection in microbiome compositional data analysis
title Variable selection in microbiome compositional data analysis
title_full Variable selection in microbiome compositional data analysis
title_fullStr Variable selection in microbiome compositional data analysis
title_full_unstemmed Variable selection in microbiome compositional data analysis
title_short Variable selection in microbiome compositional data analysis
title_sort variable selection in microbiome compositional data analysis
topic Methart
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671404/
https://www.ncbi.nlm.nih.gov/pubmed/33575585
http://dx.doi.org/10.1093/nargab/lqaa029
work_keys_str_mv AT susinantoni variableselectioninmicrobiomecompositionaldataanalysis
AT wangyiwen variableselectioninmicrobiomecompositionaldataanalysis
AT lecaokimanh variableselectioninmicrobiomecompositionaldataanalysis
AT callemluz variableselectioninmicrobiomecompositionaldataanalysis