Cargando…
CBEA: Competitive balances for taxonomic enrichment analysis
Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and spa...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9154102/ https://www.ncbi.nlm.nih.gov/pubmed/35584140 http://dx.doi.org/10.1371/journal.pcbi.1010091 |
_version_ | 1784717969275748352 |
---|---|
author | Nguyen, Quang P. Hoen, Anne G. Frost, H. Robert |
author_facet | Nguyen, Quang P. Hoen, Anne G. Frost, H. Robert |
author_sort | Nguyen, Quang P. |
collection | PubMed |
description | Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks. |
format | Online Article Text |
id | pubmed-9154102 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-91541022022-06-01 CBEA: Competitive balances for taxonomic enrichment analysis Nguyen, Quang P. Hoen, Anne G. Frost, H. Robert PLoS Comput Biol Research Article Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks. Public Library of Science 2022-05-18 /pmc/articles/PMC9154102/ /pubmed/35584140 http://dx.doi.org/10.1371/journal.pcbi.1010091 Text en © 2022 Nguyen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Nguyen, Quang P. Hoen, Anne G. Frost, H. Robert CBEA: Competitive balances for taxonomic enrichment analysis |
title | CBEA: Competitive balances for taxonomic enrichment analysis |
title_full | CBEA: Competitive balances for taxonomic enrichment analysis |
title_fullStr | CBEA: Competitive balances for taxonomic enrichment analysis |
title_full_unstemmed | CBEA: Competitive balances for taxonomic enrichment analysis |
title_short | CBEA: Competitive balances for taxonomic enrichment analysis |
title_sort | cbea: competitive balances for taxonomic enrichment analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9154102/ https://www.ncbi.nlm.nih.gov/pubmed/35584140 http://dx.doi.org/10.1371/journal.pcbi.1010091 |
work_keys_str_mv | AT nguyenquangp cbeacompetitivebalancesfortaxonomicenrichmentanalysis AT hoenanneg cbeacompetitivebalancesfortaxonomicenrichmentanalysis AT frosthrobert cbeacompetitivebalancesfortaxonomicenrichmentanalysis |