Cargando…
Cross-study analyses of microbial abundance using generalized common factor methods
BACKGROUND: By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to a new frontier in microbial ecology, promising...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561484/ https://www.ncbi.nlm.nih.gov/pubmed/37807043 http://dx.doi.org/10.1186/s12859-023-05509-4 |
_version_ | 1785117934224408576 |
---|---|
author | Hayes, Molly G. Langille, Morgan G. I. Gu, Hong |
author_facet | Hayes, Molly G. Langille, Morgan G. I. Gu, Hong |
author_sort | Hayes, Molly G. |
collection | PubMed |
description | BACKGROUND: By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to a new frontier in microbial ecology, promising the ability to leverage the microbiome to make crucial advancements in the environmental and biomedical sciences. However, this is challenging, as genomic data are high-dimensional, sparse, and noisy. Much of this noise reflects the exact conditions under which sequencing took place, and is so significant that it limits consensus-based validation of study results. RESULTS: We propose an ensemble approach for cross-study exploratory analyses of microbial abundance data in which we first estimate the variance-covariance matrix of the underlying abundances from each dataset on the log scale assuming Poisson sampling, and subsequently model these covariances jointly so as to find a shared low-dimensional subspace of the feature space. CONCLUSIONS: By viewing the projection of the latent true abundances onto this common structure, the variation is pared down to that which is shared among all datasets, and is likely to reflect more generalizable biological signal than can be inferred from individual datasets. We investigate several ways of achieving this, demonstrate that they work well on simulated and real metagenomic data in terms of signal retention and interpretability, and recommend a particular implementation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05509-4. |
format | Online Article Text |
id | pubmed-10561484 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-105614842023-10-10 Cross-study analyses of microbial abundance using generalized common factor methods Hayes, Molly G. Langille, Morgan G. I. Gu, Hong BMC Bioinformatics Research BACKGROUND: By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to a new frontier in microbial ecology, promising the ability to leverage the microbiome to make crucial advancements in the environmental and biomedical sciences. However, this is challenging, as genomic data are high-dimensional, sparse, and noisy. Much of this noise reflects the exact conditions under which sequencing took place, and is so significant that it limits consensus-based validation of study results. RESULTS: We propose an ensemble approach for cross-study exploratory analyses of microbial abundance data in which we first estimate the variance-covariance matrix of the underlying abundances from each dataset on the log scale assuming Poisson sampling, and subsequently model these covariances jointly so as to find a shared low-dimensional subspace of the feature space. CONCLUSIONS: By viewing the projection of the latent true abundances onto this common structure, the variation is pared down to that which is shared among all datasets, and is likely to reflect more generalizable biological signal than can be inferred from individual datasets. We investigate several ways of achieving this, demonstrate that they work well on simulated and real metagenomic data in terms of signal retention and interpretability, and recommend a particular implementation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05509-4. BioMed Central 2023-10-09 /pmc/articles/PMC10561484/ /pubmed/37807043 http://dx.doi.org/10.1186/s12859-023-05509-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Hayes, Molly G. Langille, Morgan G. I. Gu, Hong Cross-study analyses of microbial abundance using generalized common factor methods |
title | Cross-study analyses of microbial abundance using generalized common factor methods |
title_full | Cross-study analyses of microbial abundance using generalized common factor methods |
title_fullStr | Cross-study analyses of microbial abundance using generalized common factor methods |
title_full_unstemmed | Cross-study analyses of microbial abundance using generalized common factor methods |
title_short | Cross-study analyses of microbial abundance using generalized common factor methods |
title_sort | cross-study analyses of microbial abundance using generalized common factor methods |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561484/ https://www.ncbi.nlm.nih.gov/pubmed/37807043 http://dx.doi.org/10.1186/s12859-023-05509-4 |
work_keys_str_mv | AT hayesmollyg crossstudyanalysesofmicrobialabundanceusinggeneralizedcommonfactormethods AT langillemorgangi crossstudyanalysesofmicrobialabundanceusinggeneralizedcommonfactormethods AT guhong crossstudyanalysesofmicrobialabundanceusinggeneralizedcommonfactormethods |