Cargando…
Variable selection in microbiome compositional data analysis
Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampe...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671404/ https://www.ncbi.nlm.nih.gov/pubmed/33575585 http://dx.doi.org/10.1093/nargab/lqaa029 |
_version_ | 1783610922576642048 |
---|---|
author | Susin, Antoni Wang, Yiwen Lê Cao, Kim-Anh Calle, M Luz |
author_facet | Susin, Antoni Wang, Yiwen Lê Cao, Kim-Anh Calle, M Luz |
author_sort | Susin, Antoni |
collection | PubMed |
description | Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies. |
format | Online Article Text |
id | pubmed-7671404 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76714042021-02-10 Variable selection in microbiome compositional data analysis Susin, Antoni Wang, Yiwen Lê Cao, Kim-Anh Calle, M Luz NAR Genom Bioinform Methart Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies. Oxford University Press 2020-05-13 /pmc/articles/PMC7671404/ /pubmed/33575585 http://dx.doi.org/10.1093/nargab/lqaa029 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methart Susin, Antoni Wang, Yiwen Lê Cao, Kim-Anh Calle, M Luz Variable selection in microbiome compositional data analysis |
title | Variable selection in microbiome compositional data analysis |
title_full | Variable selection in microbiome compositional data analysis |
title_fullStr | Variable selection in microbiome compositional data analysis |
title_full_unstemmed | Variable selection in microbiome compositional data analysis |
title_short | Variable selection in microbiome compositional data analysis |
title_sort | variable selection in microbiome compositional data analysis |
topic | Methart |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671404/ https://www.ncbi.nlm.nih.gov/pubmed/33575585 http://dx.doi.org/10.1093/nargab/lqaa029 |
work_keys_str_mv | AT susinantoni variableselectioninmicrobiomecompositionaldataanalysis AT wangyiwen variableselectioninmicrobiomecompositionaldataanalysis AT lecaokimanh variableselectioninmicrobiomecompositionaldataanalysis AT callemluz variableselectioninmicrobiomecompositionaldataanalysis |