Cargando…

Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype

Microbiome data have several specific characteristics (sparsity and compositionality) that introduce challenges in data analysis. The integration of prior information regarding the data structure, such as phylogenetic structure and repeated-measure study designs, into analysis, is an effective appro...

Descripción completa

Detalles Bibliográficos
Autores principales: Martino, Cameron, McDonald, Daniel, Cantrell, Kalen, Dilmore, Amanda Hazel, Vázquez-Baeza, Yoshiki, Shenhav, Liat, Shaffer, Justin P., Rahman, Gibraan, Armstrong, George, Allaband, Celeste, Song, Se Jin, Knight, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9238373/
https://www.ncbi.nlm.nih.gov/pubmed/35477286
http://dx.doi.org/10.1128/msystems.00050-22
_version_ 1784737036016549888
author Martino, Cameron
McDonald, Daniel
Cantrell, Kalen
Dilmore, Amanda Hazel
Vázquez-Baeza, Yoshiki
Shenhav, Liat
Shaffer, Justin P.
Rahman, Gibraan
Armstrong, George
Allaband, Celeste
Song, Se Jin
Knight, Rob
author_facet Martino, Cameron
McDonald, Daniel
Cantrell, Kalen
Dilmore, Amanda Hazel
Vázquez-Baeza, Yoshiki
Shenhav, Liat
Shaffer, Justin P.
Rahman, Gibraan
Armstrong, George
Allaband, Celeste
Song, Se Jin
Knight, Rob
author_sort Martino, Cameron
collection PubMed
description Microbiome data have several specific characteristics (sparsity and compositionality) that introduce challenges in data analysis. The integration of prior information regarding the data structure, such as phylogenetic structure and repeated-measure study designs, into analysis, is an effective approach for revealing robust patterns in microbiome data. Past methods have addressed some but not all of these challenges and features: for example, robust principal-component analysis (RPCA) addresses sparsity and compositionality; compositional tensor factorization (CTF) addresses sparsity, compositionality, and repeated measure study designs; and UniFrac incorporates phylogenetic information. Here we introduce a strategy of incorporating phylogenetic information into RPCA and CTF. The resulting methods, phylo-RPCA, and phylo-CTF, provide substantial improvements over state-of-the-art methods in terms of discriminatory power of underlying clustering ranging from the mode of delivery to adult human lifestyle. We demonstrate quantitatively that the addition of phylogenetic information improves effect size and classification accuracy in both data-driven simulated data and real microbiome data. IMPORTANCE Microbiome data analysis can be difficult because of particular data features, some unavoidable and some due to technical limitations of DNA sequencing instruments. The first step in many analyses that ultimately reveals patterns of similarities and differences among sets of samples (e.g., separating samples from sick and healthy people or samples from seawater versus soil) is calculating the difference between each pair of samples. We introduce two new methods to calculate these differences that combine features of past methods, specifically being able to take into account the principles that most types of microbes are not in most samples (sparsity), that abundances are relative rather than absolute (compositionality), and that all microbes have a shared evolutionary history (phylogeny). We show using simulated and real data that our new methods provide improved classification accuracy of ordinal sample clusters and increased effect size between sample groups on beta-diversity distances.
format Online
Article
Text
id pubmed-9238373
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-92383732022-06-29 Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype Martino, Cameron McDonald, Daniel Cantrell, Kalen Dilmore, Amanda Hazel Vázquez-Baeza, Yoshiki Shenhav, Liat Shaffer, Justin P. Rahman, Gibraan Armstrong, George Allaband, Celeste Song, Se Jin Knight, Rob mSystems Research Article Microbiome data have several specific characteristics (sparsity and compositionality) that introduce challenges in data analysis. The integration of prior information regarding the data structure, such as phylogenetic structure and repeated-measure study designs, into analysis, is an effective approach for revealing robust patterns in microbiome data. Past methods have addressed some but not all of these challenges and features: for example, robust principal-component analysis (RPCA) addresses sparsity and compositionality; compositional tensor factorization (CTF) addresses sparsity, compositionality, and repeated measure study designs; and UniFrac incorporates phylogenetic information. Here we introduce a strategy of incorporating phylogenetic information into RPCA and CTF. The resulting methods, phylo-RPCA, and phylo-CTF, provide substantial improvements over state-of-the-art methods in terms of discriminatory power of underlying clustering ranging from the mode of delivery to adult human lifestyle. We demonstrate quantitatively that the addition of phylogenetic information improves effect size and classification accuracy in both data-driven simulated data and real microbiome data. IMPORTANCE Microbiome data analysis can be difficult because of particular data features, some unavoidable and some due to technical limitations of DNA sequencing instruments. The first step in many analyses that ultimately reveals patterns of similarities and differences among sets of samples (e.g., separating samples from sick and healthy people or samples from seawater versus soil) is calculating the difference between each pair of samples. We introduce two new methods to calculate these differences that combine features of past methods, specifically being able to take into account the principles that most types of microbes are not in most samples (sparsity), that abundances are relative rather than absolute (compositionality), and that all microbes have a shared evolutionary history (phylogeny). We show using simulated and real data that our new methods provide improved classification accuracy of ordinal sample clusters and increased effect size between sample groups on beta-diversity distances. American Society for Microbiology 2022-04-28 /pmc/articles/PMC9238373/ /pubmed/35477286 http://dx.doi.org/10.1128/msystems.00050-22 Text en Copyright © 2022 Martino et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Martino, Cameron
McDonald, Daniel
Cantrell, Kalen
Dilmore, Amanda Hazel
Vázquez-Baeza, Yoshiki
Shenhav, Liat
Shaffer, Justin P.
Rahman, Gibraan
Armstrong, George
Allaband, Celeste
Song, Se Jin
Knight, Rob
Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype
title Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype
title_full Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype
title_fullStr Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype
title_full_unstemmed Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype
title_short Compositionally Aware Phylogenetic Beta-Diversity Measures Better Resolve Microbiomes Associated with Phenotype
title_sort compositionally aware phylogenetic beta-diversity measures better resolve microbiomes associated with phenotype
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9238373/
https://www.ncbi.nlm.nih.gov/pubmed/35477286
http://dx.doi.org/10.1128/msystems.00050-22
work_keys_str_mv AT martinocameron compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT mcdonalddaniel compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT cantrellkalen compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT dilmoreamandahazel compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT vazquezbaezayoshiki compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT shenhavliat compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT shafferjustinp compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT rahmangibraan compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT armstronggeorge compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT allabandceleste compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT songsejin compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype
AT knightrob compositionallyawarephylogeneticbetadiversitymeasuresbetterresolvemicrobiomesassociatedwithphenotype