Cargando…

Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data

Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kakaradov, Boyko, Xiong, Hui Yuan, Lee, Leo J, Jojic, Nebojsa, Frey, Brendan J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3330053/ https://www.ncbi.nlm.nih.gov/pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11

_version_	1782229924564697088
author	Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J
author_facet	Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J
author_sort	Kakaradov, Boyko
collection	PubMed
description	Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches.
format	Online Article Text
id	pubmed-3330053
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33300532012-04-20 Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J BMC Bioinformatics Proceedings Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches. BioMed Central 2012-04-19 /pmc/articles/PMC3330053/ /pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11 Text en Copyright ©2012 Kakaradov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title	Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_full	Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_fullStr	Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_full_unstemmed	Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_short	Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_sort	challenges in estimating percent inclusion of alternatively spliced junctions from rna-seq data
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3330053/ https://www.ncbi.nlm.nih.gov/pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11
work_keys_str_mv	AT kakaradovboyko challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT xionghuiyuan challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT leeleoj challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT jojicnebojsa challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT freybrendanj challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata

Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data

Ejemplares similares