Cargando…

Analysis of Microbiome Data in the Presence of Excess Zeros

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Alth...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaul, Abhishek, Mandal, Siddhartha, Davidov, Ori, Peddada, Shyamal D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682008/
https://www.ncbi.nlm.nih.gov/pubmed/29163406
http://dx.doi.org/10.3389/fmicb.2017.02114
_version_ 1783278022439206912
author Kaul, Abhishek
Mandal, Siddhartha
Davidov, Ori
Peddada, Shyamal D.
author_facet Kaul, Abhishek
Mandal, Siddhartha
Davidov, Ori
Peddada, Shyamal D.
author_sort Kaul, Abhishek
collection PubMed
description Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.
format Online
Article
Text
id pubmed-5682008
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-56820082017-11-21 Analysis of Microbiome Data in the Presence of Excess Zeros Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. Front Microbiol Microbiology Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR. Frontiers Media S.A. 2017-11-07 /pmc/articles/PMC5682008/ /pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114 Text en Copyright © 2017 Kaul, Mandal, Davidov and Peddada. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Kaul, Abhishek
Mandal, Siddhartha
Davidov, Ori
Peddada, Shyamal D.
Analysis of Microbiome Data in the Presence of Excess Zeros
title Analysis of Microbiome Data in the Presence of Excess Zeros
title_full Analysis of Microbiome Data in the Presence of Excess Zeros
title_fullStr Analysis of Microbiome Data in the Presence of Excess Zeros
title_full_unstemmed Analysis of Microbiome Data in the Presence of Excess Zeros
title_short Analysis of Microbiome Data in the Presence of Excess Zeros
title_sort analysis of microbiome data in the presence of excess zeros
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682008/
https://www.ncbi.nlm.nih.gov/pubmed/29163406
http://dx.doi.org/10.3389/fmicb.2017.02114
work_keys_str_mv AT kaulabhishek analysisofmicrobiomedatainthepresenceofexcesszeros
AT mandalsiddhartha analysisofmicrobiomedatainthepresenceofexcesszeros
AT davidovori analysisofmicrobiomedatainthepresenceofexcesszeros
AT peddadashyamald analysisofmicrobiomedatainthepresenceofexcesszeros