Cargando…
Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data
Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3330053/ https://www.ncbi.nlm.nih.gov/pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11 |
_version_ | 1782229924564697088 |
---|---|
author | Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J |
author_facet | Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J |
author_sort | Kakaradov, Boyko |
collection | PubMed |
description | Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches. |
format | Online Article Text |
id | pubmed-3330053 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33300532012-04-20 Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J BMC Bioinformatics Proceedings Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches. BioMed Central 2012-04-19 /pmc/articles/PMC3330053/ /pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11 Text en Copyright ©2012 Kakaradov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Kakaradov, Boyko Xiong, Hui Yuan Lee, Leo J Jojic, Nebojsa Frey, Brendan J Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data |
title | Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data |
title_full | Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data |
title_fullStr | Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data |
title_full_unstemmed | Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data |
title_short | Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data |
title_sort | challenges in estimating percent inclusion of alternatively spliced junctions from rna-seq data |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3330053/ https://www.ncbi.nlm.nih.gov/pubmed/22537040 http://dx.doi.org/10.1186/1471-2105-13-S6-S11 |
work_keys_str_mv | AT kakaradovboyko challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT xionghuiyuan challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT leeleoj challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT jojicnebojsa challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata AT freybrendanj challengesinestimatingpercentinclusionofalternativelysplicedjunctionsfromrnaseqdata |