Cargando…

Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data

Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly...

Descripción completa

Detalles Bibliográficos
Autores principales: Kakaradov, Boyko, Xiong, Hui Yuan, Lee, Leo J, Jojic, Nebojsa, Frey, Brendan J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3330053/
https://www.ncbi.nlm.nih.gov/pubmed/22537040
http://dx.doi.org/10.1186/1471-2105-13-S6-S11
_version_ 1782229924564697088
author Kakaradov, Boyko
Xiong, Hui Yuan
Lee, Leo J
Jojic, Nebojsa
Frey, Brendan J
author_facet Kakaradov, Boyko
Xiong, Hui Yuan
Lee, Leo J
Jojic, Nebojsa
Frey, Brendan J
author_sort Kakaradov, Boyko
collection PubMed
description Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches.
format Online
Article
Text
id pubmed-3330053
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33300532012-04-20 Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J BMC Bioinformatics Proceedings Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches. BioMed Central 2012-04-19 /pmc/articles/PMC3330053/ /pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11 Text en Copyright ©2012 Kakaradov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Kakaradov, Boyko
Xiong, Hui Yuan
Lee, Leo J
Jojic, Nebojsa
Frey, Brendan J
Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_full Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_fullStr Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_full_unstemmed Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_short Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
title_sort challenges in estimating percent inclusion of alternatively spliced junctions from rna-seq data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3330053/
https://www.ncbi.nlm.nih.gov/pubmed/22537040
http://dx.doi.org/10.1186/1471-2105-13-S6-S11
work_keys_str_mv AT kakaradovboyko challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata
AT xionghuiyuan challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata
AT leeleoj challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata
AT jojicnebojsa challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata
AT freybrendanj challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata