Cargando…

Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model

BACKGROUND: High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Despite the amount of data generated from high-throughput experiments, statistical methods are often too simplistic to understand the complexity of...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Jing, Ji, Tieming, Ferreira, Marco A. R., Li, Yahan, Patel, Bhaumik N., Rivera, Rocio M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6819473/
https://www.ncbi.nlm.nih.gov/pubmed/31660858
http://dx.doi.org/10.1186/s12859-019-3141-6
_version_ 1783463738652753920
author Xie, Jing
Ji, Tieming
Ferreira, Marco A. R.
Li, Yahan
Patel, Bhaumik N.
Rivera, Rocio M.
author_facet Xie, Jing
Ji, Tieming
Ferreira, Marco A. R.
Li, Yahan
Patel, Bhaumik N.
Rivera, Rocio M.
author_sort Xie, Jing
collection PubMed
description BACKGROUND: High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Despite the amount of data generated from high-throughput experiments, statistical methods are often too simplistic to understand the complexity of gene expression. Specifically, existing methods do not test allele-specific expression (ASE) of a gene as a whole and variation in ASE within a gene across exons separately and simultaneously. RESULTS: We propose a generalized linear mixed model to close these gaps, incorporating variations due to genes, single nucleotide polymorphisms (SNPs), and biological replicates. To improve reliability of statistical inferences, we assign priors on each effect in the model so that information is shared across genes in the entire genome. We utilize Bayesian model selection to test the hypothesis of ASE for each gene and variations across SNPs within a gene. We apply our method to four tissue types in a bovine study to de novo detect ASE genes in the bovine genome, and uncover intriguing predictions of regulatory ASEs across gene exons and across tissue types. We compared our method to competing approaches through simulation studies that mimicked the real datasets. The R package, BLMRM, that implements our proposed algorithm, is publicly available for download at https://github.com/JingXieMIZZOU/BLMRM. CONCLUSIONS: We will show that the proposed method exhibits improved control of the false discovery rate and improved power over existing methods when SNP variation and biological variation are present. Besides, our method also maintains low computational requirements that allows for whole genome analysis.
format Online
Article
Text
id pubmed-6819473
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68194732019-10-31 Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model Xie, Jing Ji, Tieming Ferreira, Marco A. R. Li, Yahan Patel, Bhaumik N. Rivera, Rocio M. BMC Bioinformatics Methodology Article BACKGROUND: High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Despite the amount of data generated from high-throughput experiments, statistical methods are often too simplistic to understand the complexity of gene expression. Specifically, existing methods do not test allele-specific expression (ASE) of a gene as a whole and variation in ASE within a gene across exons separately and simultaneously. RESULTS: We propose a generalized linear mixed model to close these gaps, incorporating variations due to genes, single nucleotide polymorphisms (SNPs), and biological replicates. To improve reliability of statistical inferences, we assign priors on each effect in the model so that information is shared across genes in the entire genome. We utilize Bayesian model selection to test the hypothesis of ASE for each gene and variations across SNPs within a gene. We apply our method to four tissue types in a bovine study to de novo detect ASE genes in the bovine genome, and uncover intriguing predictions of regulatory ASEs across gene exons and across tissue types. We compared our method to competing approaches through simulation studies that mimicked the real datasets. The R package, BLMRM, that implements our proposed algorithm, is publicly available for download at https://github.com/JingXieMIZZOU/BLMRM. CONCLUSIONS: We will show that the proposed method exhibits improved control of the false discovery rate and improved power over existing methods when SNP variation and biological variation are present. Besides, our method also maintains low computational requirements that allows for whole genome analysis. BioMed Central 2019-10-28 /pmc/articles/PMC6819473/ /pubmed/31660858 http://dx.doi.org/10.1186/s12859-019-3141-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Xie, Jing
Ji, Tieming
Ferreira, Marco A. R.
Li, Yahan
Patel, Bhaumik N.
Rivera, Rocio M.
Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model
title Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model
title_full Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model
title_fullStr Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model
title_full_unstemmed Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model
title_short Modeling allele-specific expression at the gene and SNP levels simultaneously by a Bayesian logistic mixed regression model
title_sort modeling allele-specific expression at the gene and snp levels simultaneously by a bayesian logistic mixed regression model
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6819473/
https://www.ncbi.nlm.nih.gov/pubmed/31660858
http://dx.doi.org/10.1186/s12859-019-3141-6
work_keys_str_mv AT xiejing modelingallelespecificexpressionatthegeneandsnplevelssimultaneouslybyabayesianlogisticmixedregressionmodel
AT jitieming modelingallelespecificexpressionatthegeneandsnplevelssimultaneouslybyabayesianlogisticmixedregressionmodel
AT ferreiramarcoar modelingallelespecificexpressionatthegeneandsnplevelssimultaneouslybyabayesianlogisticmixedregressionmodel
AT liyahan modelingallelespecificexpressionatthegeneandsnplevelssimultaneouslybyabayesianlogisticmixedregressionmodel
AT patelbhaumikn modelingallelespecificexpressionatthegeneandsnplevelssimultaneouslybyabayesianlogisticmixedregressionmodel
AT riverarociom modelingallelespecificexpressionatthegeneandsnplevelssimultaneouslybyabayesianlogisticmixedregressionmodel