Cargando…
Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data
BACKGROUND: Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the level of individual transcript isoforms. To comparatively evaluate the accuracy of the m...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4511015/ https://www.ncbi.nlm.nih.gov/pubmed/26201343 http://dx.doi.org/10.1186/s13059-015-0702-5 |
_version_ | 1782382277643206656 |
---|---|
author | Kanitz, Alexander Gypas, Foivos Gruber, Andreas J. Gruber, Andreas R. Martin, Georges Zavolan, Mihaela |
author_facet | Kanitz, Alexander Gypas, Foivos Gruber, Andreas J. Gruber, Andreas R. Martin, Georges Zavolan, Mihaela |
author_sort | Kanitz, Alexander |
collection | PubMed |
description | BACKGROUND: Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the level of individual transcript isoforms. To comparatively evaluate the accuracy of the many methods that have been proposed for estimating transcript isoform abundance from RNA sequencing data, we have used both synthetic data as well as an independent experimental method for quantifying the abundance of transcript ends at the genome-wide level. RESULTS: We found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements. Nucleotide composition and intron/exon structure have comparatively little influence on the accuracy of expression estimates, which correlates most strongly with transcript/gene expression levels. To facilitate the reproduction and further extension of our study, we provide datasets, source code, and an online analysis tool on a companion website, where developers can upload expression estimates obtained with their own tool to compare them to those inferred by the methods assessed here. CONCLUSIONS: As many methods for quantifying isoform abundance with comparable accuracy are available, a user’s choice will likely be determined by factors such as the memory and runtime requirements, as well as the availability of methods for downstream analyses. Sequencing-based methods to quantify the abundance of specific transcript regions could complement validation schemes based on synthetic data and quantitative PCR in future or ongoing assessments of RNA-seq analysis methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0702-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4511015 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45110152015-07-23 Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data Kanitz, Alexander Gypas, Foivos Gruber, Andreas J. Gruber, Andreas R. Martin, Georges Zavolan, Mihaela Genome Biol Research BACKGROUND: Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the level of individual transcript isoforms. To comparatively evaluate the accuracy of the many methods that have been proposed for estimating transcript isoform abundance from RNA sequencing data, we have used both synthetic data as well as an independent experimental method for quantifying the abundance of transcript ends at the genome-wide level. RESULTS: We found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements. Nucleotide composition and intron/exon structure have comparatively little influence on the accuracy of expression estimates, which correlates most strongly with transcript/gene expression levels. To facilitate the reproduction and further extension of our study, we provide datasets, source code, and an online analysis tool on a companion website, where developers can upload expression estimates obtained with their own tool to compare them to those inferred by the methods assessed here. CONCLUSIONS: As many methods for quantifying isoform abundance with comparable accuracy are available, a user’s choice will likely be determined by factors such as the memory and runtime requirements, as well as the availability of methods for downstream analyses. Sequencing-based methods to quantify the abundance of specific transcript regions could complement validation schemes based on synthetic data and quantitative PCR in future or ongoing assessments of RNA-seq analysis methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0702-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-23 2015 /pmc/articles/PMC4511015/ /pubmed/26201343 http://dx.doi.org/10.1186/s13059-015-0702-5 Text en © Kanitz et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Kanitz, Alexander Gypas, Foivos Gruber, Andreas J. Gruber, Andreas R. Martin, Georges Zavolan, Mihaela Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data |
title | Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data |
title_full | Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data |
title_fullStr | Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data |
title_full_unstemmed | Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data |
title_short | Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data |
title_sort | comparative assessment of methods for the computational inference of transcript isoform abundance from rna-seq data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4511015/ https://www.ncbi.nlm.nih.gov/pubmed/26201343 http://dx.doi.org/10.1186/s13059-015-0702-5 |
work_keys_str_mv | AT kanitzalexander comparativeassessmentofmethodsforthecomputationalinferenceoftranscriptisoformabundancefromrnaseqdata AT gypasfoivos comparativeassessmentofmethodsforthecomputationalinferenceoftranscriptisoformabundancefromrnaseqdata AT gruberandreasj comparativeassessmentofmethodsforthecomputationalinferenceoftranscriptisoformabundancefromrnaseqdata AT gruberandreasr comparativeassessmentofmethodsforthecomputationalinferenceoftranscriptisoformabundancefromrnaseqdata AT martingeorges comparativeassessmentofmethodsforthecomputationalinferenceoftranscriptisoformabundancefromrnaseqdata AT zavolanmihaela comparativeassessmentofmethodsforthecomputationalinferenceoftranscriptisoformabundancefromrnaseqdata |