Cargando…

CBEA: Competitive balances for taxonomic enrichment analysis

Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and spa...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Quang P., Hoen, Anne G., Frost, H. Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9154102/
https://www.ncbi.nlm.nih.gov/pubmed/35584140
http://dx.doi.org/10.1371/journal.pcbi.1010091
_version_ 1784717969275748352
author Nguyen, Quang P.
Hoen, Anne G.
Frost, H. Robert
author_facet Nguyen, Quang P.
Hoen, Anne G.
Frost, H. Robert
author_sort Nguyen, Quang P.
collection PubMed
description Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks.
format Online
Article
Text
id pubmed-9154102
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-91541022022-06-01 CBEA: Competitive balances for taxonomic enrichment analysis Nguyen, Quang P. Hoen, Anne G. Frost, H. Robert PLoS Comput Biol Research Article Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks. Public Library of Science 2022-05-18 /pmc/articles/PMC9154102/ /pubmed/35584140 http://dx.doi.org/10.1371/journal.pcbi.1010091 Text en © 2022 Nguyen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Nguyen, Quang P.
Hoen, Anne G.
Frost, H. Robert
CBEA: Competitive balances for taxonomic enrichment analysis
title CBEA: Competitive balances for taxonomic enrichment analysis
title_full CBEA: Competitive balances for taxonomic enrichment analysis
title_fullStr CBEA: Competitive balances for taxonomic enrichment analysis
title_full_unstemmed CBEA: Competitive balances for taxonomic enrichment analysis
title_short CBEA: Competitive balances for taxonomic enrichment analysis
title_sort cbea: competitive balances for taxonomic enrichment analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9154102/
https://www.ncbi.nlm.nih.gov/pubmed/35584140
http://dx.doi.org/10.1371/journal.pcbi.1010091
work_keys_str_mv AT nguyenquangp cbeacompetitivebalancesfortaxonomicenrichmentanalysis
AT hoenanneg cbeacompetitivebalancesfortaxonomicenrichmentanalysis
AT frosthrobert cbeacompetitivebalancesfortaxonomicenrichmentanalysis