Cargando…

Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible

Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these appr...

Descripción completa

Detalles Bibliográficos
Autores principales: McMurdie, Paul J., Holmes, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3974642/
https://www.ncbi.nlm.nih.gov/pubmed/24699258
http://dx.doi.org/10.1371/journal.pcbi.1003531
_version_ 1782479490685861888
author McMurdie, Paul J.
Holmes, Susan
author_facet McMurdie, Paul J.
Holmes, Susan
author_sort McMurdie, Paul J.
collection PubMed
description Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
format Online
Article
Text
id pubmed-3974642
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39746422014-04-08 Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible McMurdie, Paul J. Holmes, Susan PLoS Comput Biol Research Article Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq. Public Library of Science 2014-04-03 /pmc/articles/PMC3974642/ /pubmed/24699258 http://dx.doi.org/10.1371/journal.pcbi.1003531 Text en © 2014 McMurdie, Holmes http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
McMurdie, Paul J.
Holmes, Susan
Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
title Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
title_full Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
title_fullStr Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
title_full_unstemmed Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
title_short Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
title_sort waste not, want not: why rarefying microbiome data is inadmissible
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3974642/
https://www.ncbi.nlm.nih.gov/pubmed/24699258
http://dx.doi.org/10.1371/journal.pcbi.1003531
work_keys_str_mv AT mcmurdiepaulj wastenotwantnotwhyrarefyingmicrobiomedataisinadmissible
AT holmessusan wastenotwantnotwhyrarefyingmicrobiomedataisinadmissible