Cargando…

Technical and biological variance structure in mRNA-Seq data: life in the real world

BACKGROUND: mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-di...

Descripción completa

Detalles Bibliográficos
Autores principales: Oberg, Ann L, Bot, Brian M, Grill, Diane E, Poland, Gregory A, Therneau, Terry M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505161/
https://www.ncbi.nlm.nih.gov/pubmed/22769017
http://dx.doi.org/10.1186/1471-2164-13-304
_version_ 1782250722433171456
author Oberg, Ann L
Bot, Brian M
Grill, Diane E
Poland, Gregory A
Therneau, Terry M
author_facet Oberg, Ann L
Bot, Brian M
Grill, Diane E
Poland, Gregory A
Therneau, Terry M
author_sort Oberg, Ann L
collection PubMed
description BACKGROUND: mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well. RESULTS: In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem. CONCLUSIONS: These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems.
format Online
Article
Text
id pubmed-3505161
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35051612012-11-29 Technical and biological variance structure in mRNA-Seq data: life in the real world Oberg, Ann L Bot, Brian M Grill, Diane E Poland, Gregory A Therneau, Terry M BMC Genomics Research Article BACKGROUND: mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well. RESULTS: In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem. CONCLUSIONS: These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems. BioMed Central 2012-07-07 /pmc/articles/PMC3505161/ /pubmed/22769017 http://dx.doi.org/10.1186/1471-2164-13-304 Text en Copyright ©2012 Oberg et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Oberg, Ann L
Bot, Brian M
Grill, Diane E
Poland, Gregory A
Therneau, Terry M
Technical and biological variance structure in mRNA-Seq data: life in the real world
title Technical and biological variance structure in mRNA-Seq data: life in the real world
title_full Technical and biological variance structure in mRNA-Seq data: life in the real world
title_fullStr Technical and biological variance structure in mRNA-Seq data: life in the real world
title_full_unstemmed Technical and biological variance structure in mRNA-Seq data: life in the real world
title_short Technical and biological variance structure in mRNA-Seq data: life in the real world
title_sort technical and biological variance structure in mrna-seq data: life in the real world
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505161/
https://www.ncbi.nlm.nih.gov/pubmed/22769017
http://dx.doi.org/10.1186/1471-2164-13-304
work_keys_str_mv AT obergannl technicalandbiologicalvariancestructureinmrnaseqdatalifeintherealworld
AT botbrianm technicalandbiologicalvariancestructureinmrnaseqdatalifeintherealworld
AT grilldianee technicalandbiologicalvariancestructureinmrnaseqdatalifeintherealworld
AT polandgregorya technicalandbiologicalvariancestructureinmrnaseqdatalifeintherealworld
AT therneauterrym technicalandbiologicalvariancestructureinmrnaseqdatalifeintherealworld