Cargando…
Sequence count data are poorly fit by the negative binomial distribution
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192467/ https://www.ncbi.nlm.nih.gov/pubmed/32352970 http://dx.doi.org/10.1371/journal.pone.0224909 |
_version_ | 1783528015370649600 |
---|---|
author | Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier |
author_facet | Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier |
author_sort | Hawinkel, Stijn |
collection | PubMed |
description | Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods. |
format | Online Article Text |
id | pubmed-7192467 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-71924672020-05-11 Sequence count data are poorly fit by the negative binomial distribution Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier PLoS One Research Article Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods. Public Library of Science 2020-04-30 /pmc/articles/PMC7192467/ /pubmed/32352970 http://dx.doi.org/10.1371/journal.pone.0224909 Text en © 2020 Hawinkel et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Hawinkel, Stijn Rayner, J. C. W. Bijnens, Luc Thas, Olivier Sequence count data are poorly fit by the negative binomial distribution |
title | Sequence count data are poorly fit by the negative binomial distribution |
title_full | Sequence count data are poorly fit by the negative binomial distribution |
title_fullStr | Sequence count data are poorly fit by the negative binomial distribution |
title_full_unstemmed | Sequence count data are poorly fit by the negative binomial distribution |
title_short | Sequence count data are poorly fit by the negative binomial distribution |
title_sort | sequence count data are poorly fit by the negative binomial distribution |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192467/ https://www.ncbi.nlm.nih.gov/pubmed/32352970 http://dx.doi.org/10.1371/journal.pone.0224909 |
work_keys_str_mv | AT hawinkelstijn sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT raynerjcw sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT bijnensluc sequencecountdataarepoorlyfitbythenegativebinomialdistribution AT thasolivier sequencecountdataarepoorlyfitbythenegativebinomialdistribution |