Cargando…

Differential expression analysis for RNAseq using Poisson mixed models

Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample non-independen...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Shiquan, Hood, Michelle, Scott, Laura, Peng, Qinke, Mukherjee, Sayan, Tung, Jenny, Zhou, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499851/
https://www.ncbi.nlm.nih.gov/pubmed/28369632
http://dx.doi.org/10.1093/nar/gkx204
_version_ 1783248540370206720
author Sun, Shiquan
Hood, Michelle
Scott, Laura
Peng, Qinke
Mukherjee, Sayan
Tung, Jenny
Zhou, Xiang
author_facet Sun, Shiquan
Hood, Michelle
Scott, Laura
Peng, Qinke
Mukherjee, Sayan
Tung, Jenny
Zhou, Xiang
author_sort Sun, Shiquan
collection PubMed
description Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample non-independence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample non-independence. We also develop a scalable sampling-based inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes). We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html.
format Online
Article
Text
id pubmed-5499851
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54998512017-07-12 Differential expression analysis for RNAseq using Poisson mixed models Sun, Shiquan Hood, Michelle Scott, Laura Peng, Qinke Mukherjee, Sayan Tung, Jenny Zhou, Xiang Nucleic Acids Res Methods Online Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample non-independence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample non-independence. We also develop a scalable sampling-based inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes). We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html. Oxford University Press 2017-06-20 2017-03-29 /pmc/articles/PMC5499851/ /pubmed/28369632 http://dx.doi.org/10.1093/nar/gkx204 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Sun, Shiquan
Hood, Michelle
Scott, Laura
Peng, Qinke
Mukherjee, Sayan
Tung, Jenny
Zhou, Xiang
Differential expression analysis for RNAseq using Poisson mixed models
title Differential expression analysis for RNAseq using Poisson mixed models
title_full Differential expression analysis for RNAseq using Poisson mixed models
title_fullStr Differential expression analysis for RNAseq using Poisson mixed models
title_full_unstemmed Differential expression analysis for RNAseq using Poisson mixed models
title_short Differential expression analysis for RNAseq using Poisson mixed models
title_sort differential expression analysis for rnaseq using poisson mixed models
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499851/
https://www.ncbi.nlm.nih.gov/pubmed/28369632
http://dx.doi.org/10.1093/nar/gkx204
work_keys_str_mv AT sunshiquan differentialexpressionanalysisforrnasequsingpoissonmixedmodels
AT hoodmichelle differentialexpressionanalysisforrnasequsingpoissonmixedmodels
AT scottlaura differentialexpressionanalysisforrnasequsingpoissonmixedmodels
AT pengqinke differentialexpressionanalysisforrnasequsingpoissonmixedmodels
AT mukherjeesayan differentialexpressionanalysisforrnasequsingpoissonmixedmodels
AT tungjenny differentialexpressionanalysisforrnasequsingpoissonmixedmodels
AT zhouxiang differentialexpressionanalysisforrnasequsingpoissonmixedmodels