Cargando…

Analyzing allele specific RNA expression using mixture models

BACKGROUND: Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imb...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Rong, Smith, Ryan M, Seweryn, Michal, Wang, Danxin, Hartmann, Katherine, Webb, Amy, Sadee, Wolfgang, Rempala, Grzegorz A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521363/
https://www.ncbi.nlm.nih.gov/pubmed/26231172
http://dx.doi.org/10.1186/s12864-015-1749-0
_version_ 1782383804303802368
author Lu, Rong
Smith, Ryan M
Seweryn, Michal
Wang, Danxin
Hartmann, Katherine
Webb, Amy
Sadee, Wolfgang
Rempala, Grzegorz A
author_facet Lu, Rong
Smith, Ryan M
Seweryn, Michal
Wang, Danxin
Hartmann, Katherine
Webb, Amy
Sadee, Wolfgang
Rempala, Grzegorz A
author_sort Lu, Rong
collection PubMed
description BACKGROUND: Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imbalance (AEI) at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing binomial-type methods used to model allelic expression from RNA-seq assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically. RESULTS: Here we propose a new strategy for AEI analysis using RNA-seq data. Under the null hypothesis of no AEI, a group of SNPs (possibly across multiple genes) is considered comparable if their respective total sums of the allelic reads are of similar magnitude. Within each group of “comparable” SNPs, we identify SNPs with AEI signal by fitting a mixture of folded Skellam distributions to the absolute values of read differences. By applying this methodology to RNA-Seq data from human autopsy brain tissues, we identified numerous instances of moderate to strong imbalanced allelic RNA expression at heterozygous SNPs. Findings with SLC1A3 mRNA exhibiting known expression differences are discussed as examples. CONCLUSION: The folded Skellam mixture model searches for SNPs with significant difference between reference and variant allele reads (adjusted for different library sizes), using information from a group of “comparable” SNPs across multiple genes. This model is particularly suitable for performing AEI analysis on genes with few heterozygous SNPs available from RNA-seq, and it can fit over-dispersed read counts without specifying the direction of the correlation between reference and variant alleles. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1749-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4521363
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45213632015-08-01 Analyzing allele specific RNA expression using mixture models Lu, Rong Smith, Ryan M Seweryn, Michal Wang, Danxin Hartmann, Katherine Webb, Amy Sadee, Wolfgang Rempala, Grzegorz A BMC Genomics Methodology Article BACKGROUND: Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imbalance (AEI) at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing binomial-type methods used to model allelic expression from RNA-seq assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically. RESULTS: Here we propose a new strategy for AEI analysis using RNA-seq data. Under the null hypothesis of no AEI, a group of SNPs (possibly across multiple genes) is considered comparable if their respective total sums of the allelic reads are of similar magnitude. Within each group of “comparable” SNPs, we identify SNPs with AEI signal by fitting a mixture of folded Skellam distributions to the absolute values of read differences. By applying this methodology to RNA-Seq data from human autopsy brain tissues, we identified numerous instances of moderate to strong imbalanced allelic RNA expression at heterozygous SNPs. Findings with SLC1A3 mRNA exhibiting known expression differences are discussed as examples. CONCLUSION: The folded Skellam mixture model searches for SNPs with significant difference between reference and variant allele reads (adjusted for different library sizes), using information from a group of “comparable” SNPs across multiple genes. This model is particularly suitable for performing AEI analysis on genes with few heterozygous SNPs available from RNA-seq, and it can fit over-dispersed read counts without specifying the direction of the correlation between reference and variant alleles. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1749-0) contains supplementary material, which is available to authorized users. BioMed Central 2015-08-01 /pmc/articles/PMC4521363/ /pubmed/26231172 http://dx.doi.org/10.1186/s12864-015-1749-0 Text en © Lu et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Lu, Rong
Smith, Ryan M
Seweryn, Michal
Wang, Danxin
Hartmann, Katherine
Webb, Amy
Sadee, Wolfgang
Rempala, Grzegorz A
Analyzing allele specific RNA expression using mixture models
title Analyzing allele specific RNA expression using mixture models
title_full Analyzing allele specific RNA expression using mixture models
title_fullStr Analyzing allele specific RNA expression using mixture models
title_full_unstemmed Analyzing allele specific RNA expression using mixture models
title_short Analyzing allele specific RNA expression using mixture models
title_sort analyzing allele specific rna expression using mixture models
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521363/
https://www.ncbi.nlm.nih.gov/pubmed/26231172
http://dx.doi.org/10.1186/s12864-015-1749-0
work_keys_str_mv AT lurong analyzingallelespecificrnaexpressionusingmixturemodels
AT smithryanm analyzingallelespecificrnaexpressionusingmixturemodels
AT sewerynmichal analyzingallelespecificrnaexpressionusingmixturemodels
AT wangdanxin analyzingallelespecificrnaexpressionusingmixturemodels
AT hartmannkatherine analyzingallelespecificrnaexpressionusingmixturemodels
AT webbamy analyzingallelespecificrnaexpressionusingmixturemodels
AT sadeewolfgang analyzingallelespecificrnaexpressionusingmixturemodels
AT rempalagrzegorza analyzingallelespecificrnaexpressionusingmixturemodels