Cargando…

Estimation of alternative splicing isoform frequencies from RNA-Seq data

BACKGROUND: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative...

Descripción completa

Detalles Bibliográficos
Autores principales: Nicolae, Marius, Mangul, Serghei, Măndoiu, Ion I, Zelikovsky, Alex
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107792/
https://www.ncbi.nlm.nih.gov/pubmed/21504602
http://dx.doi.org/10.1186/1748-7188-6-9
_version_ 1782205247635062784
author Nicolae, Marius
Mangul, Serghei
Măndoiu, Ion I
Zelikovsky, Alex
author_facet Nicolae, Marius
Mangul, Serghei
Măndoiu, Ion I
Zelikovsky, Alex
author_sort Nicolae, Marius
collection PubMed
description BACKGROUND: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. RESULTS: In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/. CONCLUSIONS: Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.
format Online
Article
Text
id pubmed-3107792
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31077922011-06-04 Estimation of alternative splicing isoform frequencies from RNA-Seq data Nicolae, Marius Mangul, Serghei Măndoiu, Ion I Zelikovsky, Alex Algorithms Mol Biol Research BACKGROUND: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. RESULTS: In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/. CONCLUSIONS: Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes. BioMed Central 2011-04-19 /pmc/articles/PMC3107792/ /pubmed/21504602 http://dx.doi.org/10.1186/1748-7188-6-9 Text en Copyright ©2011 Nicolae et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Nicolae, Marius
Mangul, Serghei
Măndoiu, Ion I
Zelikovsky, Alex
Estimation of alternative splicing isoform frequencies from RNA-Seq data
title Estimation of alternative splicing isoform frequencies from RNA-Seq data
title_full Estimation of alternative splicing isoform frequencies from RNA-Seq data
title_fullStr Estimation of alternative splicing isoform frequencies from RNA-Seq data
title_full_unstemmed Estimation of alternative splicing isoform frequencies from RNA-Seq data
title_short Estimation of alternative splicing isoform frequencies from RNA-Seq data
title_sort estimation of alternative splicing isoform frequencies from rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107792/
https://www.ncbi.nlm.nih.gov/pubmed/21504602
http://dx.doi.org/10.1186/1748-7188-6-9
work_keys_str_mv AT nicolaemarius estimationofalternativesplicingisoformfrequenciesfromrnaseqdata
AT mangulserghei estimationofalternativesplicingisoformfrequenciesfromrnaseqdata
AT mandoiuioni estimationofalternativesplicingisoformfrequenciesfromrnaseqdata
AT zelikovskyalex estimationofalternativesplicingisoformfrequenciesfromrnaseqdata