Cargando…

Computational approaches for isoform detection and estimation: good and bad news

BACKGROUND: The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue - at a particular stage and condition - to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quant...

Descripción completa

Detalles Bibliográficos
Autores principales:	Angelini, Claudia, Canditiis, Daniela De, Feis, Italia De
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098781/ https://www.ncbi.nlm.nih.gov/pubmed/24885830 http://dx.doi.org/10.1186/1471-2105-15-135

_version_	1782326394734247936
author	Angelini, Claudia Canditiis, Daniela De Feis, Italia De
author_facet	Angelini, Claudia Canditiis, Daniela De Feis, Italia De
author_sort	Angelini, Claudia
collection	PubMed
description	BACKGROUND: The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue - at a particular stage and condition - to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments. RESULTS: We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms. CONCLUSIONS: Both detection and quantification of all isoforms from RNA-seq data are still hard problems and they are affected by many factors. Overall, the performance significantly changes since it depends on the modes of action and on the type of available annotation. Results obtained using complete or partial annotation are able to detect most of the expressed isoforms, even though the number of false positives is often high. Fully data driven approaches require more attention, at least for complex eucaryotic genomes. Improvements are desirable especially for isoform quantification and for isoform detection with low abundance.
format	Online Article Text
id	pubmed-4098781
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40987812014-07-18 Computational approaches for isoform detection and estimation: good and bad news Angelini, Claudia Canditiis, Daniela De Feis, Italia De BMC Bioinformatics Research Article BACKGROUND: The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue - at a particular stage and condition - to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments. RESULTS: We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms. CONCLUSIONS: Both detection and quantification of all isoforms from RNA-seq data are still hard problems and they are affected by many factors. Overall, the performance significantly changes since it depends on the modes of action and on the type of available annotation. Results obtained using complete or partial annotation are able to detect most of the expressed isoforms, even though the number of false positives is often high. Fully data driven approaches require more attention, at least for complex eucaryotic genomes. Improvements are desirable especially for isoform quantification and for isoform detection with low abundance. BioMed Central 2014-05-09 /pmc/articles/PMC4098781/ /pubmed/24885830 http://dx.doi.org/10.1186/1471-2105-15-135 Text en Copyright © 2014 Angelini et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Angelini, Claudia Canditiis, Daniela De Feis, Italia De Computational approaches for isoform detection and estimation: good and bad news
title	Computational approaches for isoform detection and estimation: good and bad news
title_full	Computational approaches for isoform detection and estimation: good and bad news
title_fullStr	Computational approaches for isoform detection and estimation: good and bad news
title_full_unstemmed	Computational approaches for isoform detection and estimation: good and bad news
title_short	Computational approaches for isoform detection and estimation: good and bad news
title_sort	computational approaches for isoform detection and estimation: good and bad news
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098781/ https://www.ncbi.nlm.nih.gov/pubmed/24885830 http://dx.doi.org/10.1186/1471-2105-15-135
work_keys_str_mv	AT angeliniclaudia computationalapproachesforisoformdetectionandestimationgoodandbadnews AT canditiisdanielade computationalapproachesforisoformdetectionandestimationgoodandbadnews AT feisitaliade computationalapproachesforisoformdetectionandestimationgoodandbadnews

Computational approaches for isoform detection and estimation: good and bad news

Ejemplares similares