Cargando…

A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes

BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome....

Descripción completa

Detalles Bibliográficos
Autores principales: Nariai, Naoki, Kojima, Kaname, Mimori, Takahiro, Kawai, Yosuke, Nagasaki, Masao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895278/
https://www.ncbi.nlm.nih.gov/pubmed/26818838
http://dx.doi.org/10.1186/s12864-015-2295-5
_version_ 1782435817115877376
author Nariai, Naoki
Kojima, Kaname
Mimori, Takahiro
Kawai, Yosuke
Nagasaki, Masao
author_facet Nariai, Naoki
Kojima, Kaname
Mimori, Takahiro
Kawai, Yosuke
Nagasaki, Masao
author_sort Nariai, Naoki
collection PubMed
description BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. RESULTS: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. CONCLUSIONS: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar.
format Online
Article
Text
id pubmed-4895278
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48952782016-06-10 A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes Nariai, Naoki Kojima, Kaname Mimori, Takahiro Kawai, Yosuke Nagasaki, Masao BMC Genomics Proceedings BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. RESULTS: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. CONCLUSIONS: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar. BioMed Central 2016-01-11 /pmc/articles/PMC4895278/ /pubmed/26818838 http://dx.doi.org/10.1186/s12864-015-2295-5 Text en © Nariai et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Nariai, Naoki
Kojima, Kaname
Mimori, Takahiro
Kawai, Yosuke
Nagasaki, Masao
A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
title A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
title_full A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
title_fullStr A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
title_full_unstemmed A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
title_short A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes
title_sort bayesian approach for estimating allele-specific expression from rna-seq data with diploid genomes
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895278/
https://www.ncbi.nlm.nih.gov/pubmed/26818838
http://dx.doi.org/10.1186/s12864-015-2295-5
work_keys_str_mv AT nariainaoki abayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT kojimakaname abayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT mimoritakahiro abayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT kawaiyosuke abayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT nagasakimasao abayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT nariainaoki bayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT kojimakaname bayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT mimoritakahiro bayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT kawaiyosuke bayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes
AT nagasakimasao bayesianapproachforestimatingallelespecificexpressionfromrnaseqdatawithdiploidgenomes