Cargando…

Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples

Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our com...

Descripción completa

Detalles Bibliográficos
Autores principales: White, James Robert, Nagarajan, Niranjan, Pop, Mihai
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2661018/
https://www.ncbi.nlm.nih.gov/pubmed/19360128
http://dx.doi.org/10.1371/journal.pcbi.1000352
_version_ 1782165773287948288
author White, James Robert
Nagarajan, Niranjan
Pop, Mihai
author_facet White, James Robert
Nagarajan, Niranjan
Pop, Mihai
author_sort White, James Robert
collection PubMed
description Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them. We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied to digital gene expression studies (e.g. SAGE). A web server implementation of our methods and freely available source code can be found at http://metastats.cbcb.umd.edu/.
format Text
id pubmed-2661018
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26610182009-04-10 Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples White, James Robert Nagarajan, Niranjan Pop, Mihai PLoS Comput Biol Research Article Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them. We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied to digital gene expression studies (e.g. SAGE). A web server implementation of our methods and freely available source code can be found at http://metastats.cbcb.umd.edu/. Public Library of Science 2009-04-10 /pmc/articles/PMC2661018/ /pubmed/19360128 http://dx.doi.org/10.1371/journal.pcbi.1000352 Text en White et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
White, James Robert
Nagarajan, Niranjan
Pop, Mihai
Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples
title Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples
title_full Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples
title_fullStr Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples
title_full_unstemmed Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples
title_short Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples
title_sort statistical methods for detecting differentially abundant features in clinical metagenomic samples
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2661018/
https://www.ncbi.nlm.nih.gov/pubmed/19360128
http://dx.doi.org/10.1371/journal.pcbi.1000352
work_keys_str_mv AT whitejamesrobert statisticalmethodsfordetectingdifferentiallyabundantfeaturesinclinicalmetagenomicsamples
AT nagarajanniranjan statisticalmethodsfordetectingdifferentiallyabundantfeaturesinclinicalmetagenomicsamples
AT popmihai statisticalmethodsfordetectingdifferentiallyabundantfeaturesinclinicalmetagenomicsamples