Cargando…

Functional regression method for whole genome eQTL epistasis analysis with sequencing data

BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Kelin, Jin, Li, Xiong, Momiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5436462/
https://www.ncbi.nlm.nih.gov/pubmed/28521784
http://dx.doi.org/10.1186/s12864-017-3777-4
_version_ 1783237411198730240
author Xu, Kelin
Jin, Li
Xiong, Momiao
author_facet Xu, Kelin
Jin, Li
Xiong, Momiao
author_sort Xu, Kelin
collection PubMed
description BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. METHODS: We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. RESULTS: By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. CONCLUSIONS: The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5436462
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54364622017-05-19 Functional regression method for whole genome eQTL epistasis analysis with sequencing data Xu, Kelin Jin, Li Xiong, Momiao BMC Genomics Methodology Article BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. METHODS: We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. RESULTS: By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. CONCLUSIONS: The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-18 /pmc/articles/PMC5436462/ /pubmed/28521784 http://dx.doi.org/10.1186/s12864-017-3777-4 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Xu, Kelin
Jin, Li
Xiong, Momiao
Functional regression method for whole genome eQTL epistasis analysis with sequencing data
title Functional regression method for whole genome eQTL epistasis analysis with sequencing data
title_full Functional regression method for whole genome eQTL epistasis analysis with sequencing data
title_fullStr Functional regression method for whole genome eQTL epistasis analysis with sequencing data
title_full_unstemmed Functional regression method for whole genome eQTL epistasis analysis with sequencing data
title_short Functional regression method for whole genome eQTL epistasis analysis with sequencing data
title_sort functional regression method for whole genome eqtl epistasis analysis with sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5436462/
https://www.ncbi.nlm.nih.gov/pubmed/28521784
http://dx.doi.org/10.1186/s12864-017-3777-4
work_keys_str_mv AT xukelin functionalregressionmethodforwholegenomeeqtlepistasisanalysiswithsequencingdata
AT jinli functionalregressionmethodforwholegenomeeqtlepistasisanalysiswithsequencingdata
AT xiongmomiao functionalregressionmethodforwholegenomeeqtlepistasisanalysiswithsequencingdata