Cargando…

Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

BACKGROUND: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samp...

Descripción completa

Detalles Bibliográficos
Autores principales: Deelen, Patrick, Zhernakova, Daria V, de Haan, Mark, van der Sijde, Marijke, Bonder, Marc Jan, Karjalainen, Juha, van der Velde, K Joeri, Abbott, Kristin M, Fu, Jingyuan, Wijmenga, Cisca, Sinke, Richard J, Swertz, Morris A, Franke, Lude
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4423486/
https://www.ncbi.nlm.nih.gov/pubmed/25954321
http://dx.doi.org/10.1186/s13073-015-0152-4
_version_ 1782370221114261504
author Deelen, Patrick
Zhernakova, Daria V
de Haan, Mark
van der Sijde, Marijke
Bonder, Marc Jan
Karjalainen, Juha
van der Velde, K Joeri
Abbott, Kristin M
Fu, Jingyuan
Wijmenga, Cisca
Sinke, Richard J
Swertz, Morris A
Franke, Lude
author_facet Deelen, Patrick
Zhernakova, Daria V
de Haan, Mark
van der Sijde, Marijke
Bonder, Marc Jan
Karjalainen, Juha
van der Velde, K Joeri
Abbott, Kristin M
Fu, Jingyuan
Wijmenga, Cisca
Sinke, Richard J
Swertz, Morris A
Franke, Lude
author_sort Deelen, Patrick
collection PubMed
description BACKGROUND: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves. METHODS: We downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available. RESULTS: 4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate ≤0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. CONCLUSIONS: By deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-015-0152-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4423486
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44234862015-05-08 Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels Deelen, Patrick Zhernakova, Daria V de Haan, Mark van der Sijde, Marijke Bonder, Marc Jan Karjalainen, Juha van der Velde, K Joeri Abbott, Kristin M Fu, Jingyuan Wijmenga, Cisca Sinke, Richard J Swertz, Morris A Franke, Lude Genome Med Research BACKGROUND: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves. METHODS: We downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available. RESULTS: 4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate ≤0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. CONCLUSIONS: By deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-015-0152-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-27 /pmc/articles/PMC4423486/ /pubmed/25954321 http://dx.doi.org/10.1186/s13073-015-0152-4 Text en © Deelen et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Deelen, Patrick
Zhernakova, Daria V
de Haan, Mark
van der Sijde, Marijke
Bonder, Marc Jan
Karjalainen, Juha
van der Velde, K Joeri
Abbott, Kristin M
Fu, Jingyuan
Wijmenga, Cisca
Sinke, Richard J
Swertz, Morris A
Franke, Lude
Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels
title Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels
title_full Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels
title_fullStr Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels
title_full_unstemmed Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels
title_short Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels
title_sort calling genotypes from public rna-sequencing data enables identification of genetic variants that affect gene-expression levels
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4423486/
https://www.ncbi.nlm.nih.gov/pubmed/25954321
http://dx.doi.org/10.1186/s13073-015-0152-4
work_keys_str_mv AT deelenpatrick callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT zhernakovadariav callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT dehaanmark callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT vandersijdemarijke callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT bondermarcjan callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT karjalainenjuha callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT vanderveldekjoeri callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT abbottkristinm callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT fujingyuan callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT wijmengacisca callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT sinkerichardj callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT swertzmorrisa callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels
AT frankelude callinggenotypesfrompublicrnasequencingdataenablesidentificationofgeneticvariantsthataffectgeneexpressionlevels