Cargando…

Analysis of Microbiome Data in the Presence of Excess Zeros

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Alth...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kaul, Abhishek, Mandal, Siddhartha, Davidov, Ori, Peddada, Shyamal D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2017
Materias:	Microbiology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682008/ https://www.ncbi.nlm.nih.gov/pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114

_version_	1783278022439206912
author	Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D.
author_facet	Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D.
author_sort	Kaul, Abhishek
collection	PubMed
description	Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.
format	Online Article Text
id	pubmed-5682008
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-56820082017-11-21 Analysis of Microbiome Data in the Presence of Excess Zeros Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. Front Microbiol Microbiology Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR. Frontiers Media S.A. 2017-11-07 /pmc/articles/PMC5682008/ /pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114 Text en Copyright © 2017 Kaul, Mandal, Davidov and Peddada. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Microbiology Kaul, Abhishek Mandal, Siddhartha Davidov, Ori Peddada, Shyamal D. Analysis of Microbiome Data in the Presence of Excess Zeros
title	Analysis of Microbiome Data in the Presence of Excess Zeros
title_full	Analysis of Microbiome Data in the Presence of Excess Zeros
title_fullStr	Analysis of Microbiome Data in the Presence of Excess Zeros
title_full_unstemmed	Analysis of Microbiome Data in the Presence of Excess Zeros
title_short	Analysis of Microbiome Data in the Presence of Excess Zeros
title_sort	analysis of microbiome data in the presence of excess zeros
topic	Microbiology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5682008/ https://www.ncbi.nlm.nih.gov/pubmed/29163406 http://dx.doi.org/10.3389/fmicb.2017.02114
work_keys_str_mv	AT kaulabhishek analysisofmicrobiomedatainthepresenceofexcesszeros AT mandalsiddhartha analysisofmicrobiomedatainthepresenceofexcesszeros AT davidovori analysisofmicrobiomedatainthepresenceofexcesszeros AT peddadashyamald analysisofmicrobiomedatainthepresenceofexcesszeros

Analysis of Microbiome Data in the Presence of Excess Zeros

Ejemplares similares