Cargando…

Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data

MOTIVATION: The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays em...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vavoulis, Dimitrios V, Taylor, Jenny C, Schuh, Anna
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5637939/ https://www.ncbi.nlm.nih.gov/pubmed/28575251 http://dx.doi.org/10.1093/bioinformatics/btx355

_version_	1783270677340487680
author	Vavoulis, Dimitrios V Taylor, Jenny C Schuh, Anna
author_facet	Vavoulis, Dimitrios V Taylor, Jenny C Schuh, Anna
author_sort	Vavoulis, Dimitrios V
collection	PubMed
description	MOTIVATION: The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. RESULTS: We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping. AVAILABILITY AND IMPLEMENTATION: All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-5637939
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-56379392017-10-17 Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data Vavoulis, Dimitrios V Taylor, Jenny C Schuh, Anna Bioinformatics Original Papers MOTIVATION: The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. RESULTS: We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping. AVAILABILITY AND IMPLEMENTATION: All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-10-01 2017-05-31 /pmc/articles/PMC5637939/ /pubmed/28575251 http://dx.doi.org/10.1093/bioinformatics/btx355 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Vavoulis, Dimitrios V Taylor, Jenny C Schuh, Anna Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
title	Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
title_full	Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
title_fullStr	Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
title_full_unstemmed	Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
title_short	Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
title_sort	hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5637939/ https://www.ncbi.nlm.nih.gov/pubmed/28575251 http://dx.doi.org/10.1093/bioinformatics/btx355
work_keys_str_mv	AT vavoulisdimitriosv hierarchicalprobabilisticmodelsformultiplegenevariantassociationsbasedonnextgenerationsequencingdata AT taylorjennyc hierarchicalprobabilisticmodelsformultiplegenevariantassociationsbasedonnextgenerationsequencingdata AT schuhanna hierarchicalprobabilisticmodelsformultiplegenevariantassociationsbasedonnextgenerationsequencingdata

Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data

Ejemplares similares