Cargando…

Sequence count data are poorly fit by the negative binomial distribution

Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hawinkel, Stijn, Rayner, J. C. W., Bijnens, Luc, Thas, Olivier
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192467/ https://www.ncbi.nlm.nih.gov/pubmed/32352970 http://dx.doi.org/10.1371/journal.pone.0224909

_version_	1783528015370649600
author	Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier
author_facet	Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier
author_sort	Hawinkel, Stijn
collection	PubMed
description	Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.
format	Online Article Text
id	pubmed-7192467
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-71924672020-05-11 Sequence count data are poorly fit by the negative binomial distribution Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier PLoS One Research Article Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods. Public Library of Science 2020-04-30 /pmc/articles/PMC7192467/ /pubmed/32352970 http://dx.doi.org/10.1371/journal.pone.0224909 Text en © 2020 Hawinkel et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier Sequence count data are poorly fit by the negative binomial distribution
title	Sequence count data are poorly fit by the negative binomial distribution
title_full	Sequence count data are poorly fit by the negative binomial distribution
title_fullStr	Sequence count data are poorly fit by the negative binomial distribution
title_full_unstemmed	Sequence count data are poorly fit by the negative binomial distribution
title_short	Sequence count data are poorly fit by the negative binomial distribution
title_sort	sequence count data are poorly fit by the negative binomial distribution
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192467/ https://www.ncbi.nlm.nih.gov/pubmed/32352970 http://dx.doi.org/10.1371/journal.pone.0224909
work_keys_str_mv	AT hawinkelstijn sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT raynerjcw sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT bijnensluc sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT thasolivier sequencecountdataarepoorlyfitbythenegativebinomialdistribution

Sequence count data are poorly fit by the negative binomial distribution

Ejemplares similares