Cargando…

Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq

Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ran, Di, Daye, Z. John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737414/
https://www.ncbi.nlm.nih.gov/pubmed/28535263
http://dx.doi.org/10.1093/nar/gkx456
_version_ 1783287515679031296
author Ran, Di
Daye, Z. John
author_facet Ran, Di
Daye, Z. John
author_sort Ran, Di
collection PubMed
description Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. The MDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing large-scale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.
format Online
Article
Text
id pubmed-5737414
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57374142018-01-09 Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq Ran, Di Daye, Z. John Nucleic Acids Res Methods Online Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. The MDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing large-scale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved. Oxford University Press 2017-07-27 2017-05-23 /pmc/articles/PMC5737414/ /pubmed/28535263 http://dx.doi.org/10.1093/nar/gkx456 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Ran, Di
Daye, Z. John
Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq
title Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq
title_full Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq
title_fullStr Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq
title_full_unstemmed Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq
title_short Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq
title_sort gene expression variability and the analysis of large-scale rna-seq studies with the mdseq
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737414/
https://www.ncbi.nlm.nih.gov/pubmed/28535263
http://dx.doi.org/10.1093/nar/gkx456
work_keys_str_mv AT randi geneexpressionvariabilityandtheanalysisoflargescalernaseqstudieswiththemdseq
AT dayezjohn geneexpressionvariabilityandtheanalysisoflargescalernaseqstudieswiththemdseq