Cargando…
Analysis of Microbiome Data in the Presence of Excess Zeros
Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Alth...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682008/ https://www.ncbi.nlm.nih.gov/pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114 |
_version_ | 1783278022439206912 |
---|---|
author | Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. |
author_facet | Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. |
author_sort | Kaul, Abhishek |
collection | PubMed |
description | Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR. |
format | Online Article Text |
id | pubmed-5682008 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-56820082017-11-21 Analysis of Microbiome Data in the Presence of Excess Zeros Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. Front Microbiol Microbiology Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR. Frontiers Media S.A. 2017-11-07 /pmc/articles/PMC5682008/ /pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114 Text en Copyright © 2017 Kaul, Mandal, Davidov and Peddada. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. Analysis of Microbiome Data in the Presence of Excess Zeros |
title | Analysis of Microbiome Data in the Presence of Excess Zeros |
title_full | Analysis of Microbiome Data in the Presence of Excess Zeros |
title_fullStr | Analysis of Microbiome Data in the Presence of Excess Zeros |
title_full_unstemmed | Analysis of Microbiome Data in the Presence of Excess Zeros |
title_short | Analysis of Microbiome Data in the Presence of Excess Zeros |
title_sort | analysis of microbiome data in the presence of excess zeros |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682008/ https://www.ncbi.nlm.nih.gov/pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114 |
work_keys_str_mv | AT kaulabhishek analysisofmicrobiomedatainthepresenceofexcesszeros AT mandalsiddhartha analysisofmicrobiomedatainthepresenceofexcesszeros AT davidovori analysisofmicrobiomedatainthepresenceofexcesszeros AT peddadashyamald analysisofmicrobiomedatainthepresenceofexcesszeros |