Cargando…
Functional regression method for whole genome eQTL epistasis analysis with sequencing data
BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and dat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5436462/ https://www.ncbi.nlm.nih.gov/pubmed/28521784 http://dx.doi.org/10.1186/s12864-017-3777-4 |
_version_ | 1783237411198730240 |
---|---|
author | Xu, Kelin Jin, Li Xiong, Momiao |
author_facet | Xu, Kelin Jin, Li Xiong, Momiao |
author_sort | Xu, Kelin |
collection | PubMed |
description | BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. METHODS: We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. RESULTS: By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. CONCLUSIONS: The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5436462 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54364622017-05-19 Functional regression method for whole genome eQTL epistasis analysis with sequencing data Xu, Kelin Jin, Li Xiong, Momiao BMC Genomics Methodology Article BACKGROUND: Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. METHODS: We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. RESULTS: By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. CONCLUSIONS: The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-18 /pmc/articles/PMC5436462/ /pubmed/28521784 http://dx.doi.org/10.1186/s12864-017-3777-4 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Xu, Kelin Jin, Li Xiong, Momiao Functional regression method for whole genome eQTL epistasis analysis with sequencing data |
title | Functional regression method for whole genome eQTL epistasis analysis with sequencing data |
title_full | Functional regression method for whole genome eQTL epistasis analysis with sequencing data |
title_fullStr | Functional regression method for whole genome eQTL epistasis analysis with sequencing data |
title_full_unstemmed | Functional regression method for whole genome eQTL epistasis analysis with sequencing data |
title_short | Functional regression method for whole genome eQTL epistasis analysis with sequencing data |
title_sort | functional regression method for whole genome eqtl epistasis analysis with sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5436462/ https://www.ncbi.nlm.nih.gov/pubmed/28521784 http://dx.doi.org/10.1186/s12864-017-3777-4 |
work_keys_str_mv | AT xukelin functionalregressionmethodforwholegenomeeqtlepistasisanalysiswithsequencingdata AT jinli functionalregressionmethodforwholegenomeeqtlepistasisanalysiswithsequencingdata AT xiongmomiao functionalregressionmethodforwholegenomeeqtlepistasisanalysiswithsequencingdata |