Cargando…

Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data

Motivation: Because of the advantages of RNA sequencing (RNA-Seq) over microarrays, it is gaining widespread popularity for highly parallel gene expression analysis. For example, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length splice forms. A numbe...

Descripción completa

Detalles Bibliográficos
Autores principales: Hayer, Katharina E., Pizarro, Angel, Lahens, Nicholas F., Hogenesch, John B., Grant, Gregory R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4673975/
https://www.ncbi.nlm.nih.gov/pubmed/26338770
http://dx.doi.org/10.1093/bioinformatics/btv488
_version_ 1782404841728901120
author Hayer, Katharina E.
Pizarro, Angel
Lahens, Nicholas F.
Hogenesch, John B.
Grant, Gregory R.
author_facet Hayer, Katharina E.
Pizarro, Angel
Lahens, Nicholas F.
Hogenesch, John B.
Grant, Gregory R.
author_sort Hayer, Katharina E.
collection PubMed
description Motivation: Because of the advantages of RNA sequencing (RNA-Seq) over microarrays, it is gaining widespread popularity for highly parallel gene expression analysis. For example, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length splice forms. A number of informatics packages have been developed for this purpose, but short reads make it a difficult problem in principle. Sequencing error and polymorphisms add further complications. It has become necessary to perform studies to determine which algorithms perform best and which if any algorithms perform adequately. However, there is a dearth of independent and unbiased benchmarking studies. Here we take an approach using both simulated and experimental benchmark data to evaluate their accuracy. Results: We conclude that most methods are inaccurate even using idealized data, and that no method is highly accurate once multiple splice forms, polymorphisms, intron signal, sequencing errors, alignment errors, annotation errors and other complicating factors are present. These results point to the pressing need for further algorithm development. Availability and implementation: Simulated datasets and other supporting information can be found at http://bioinf.itmat.upenn.edu/BEERS/bp2 Supplementary information: Supplementary data are available at Bioinformatics online. Contact: hayer@upenn.edu
format Online
Article
Text
id pubmed-4673975
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46739752015-12-10 Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data Hayer, Katharina E. Pizarro, Angel Lahens, Nicholas F. Hogenesch, John B. Grant, Gregory R. Bioinformatics Original Papers Motivation: Because of the advantages of RNA sequencing (RNA-Seq) over microarrays, it is gaining widespread popularity for highly parallel gene expression analysis. For example, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length splice forms. A number of informatics packages have been developed for this purpose, but short reads make it a difficult problem in principle. Sequencing error and polymorphisms add further complications. It has become necessary to perform studies to determine which algorithms perform best and which if any algorithms perform adequately. However, there is a dearth of independent and unbiased benchmarking studies. Here we take an approach using both simulated and experimental benchmark data to evaluate their accuracy. Results: We conclude that most methods are inaccurate even using idealized data, and that no method is highly accurate once multiple splice forms, polymorphisms, intron signal, sequencing errors, alignment errors, annotation errors and other complicating factors are present. These results point to the pressing need for further algorithm development. Availability and implementation: Simulated datasets and other supporting information can be found at http://bioinf.itmat.upenn.edu/BEERS/bp2 Supplementary information: Supplementary data are available at Bioinformatics online. Contact: hayer@upenn.edu Oxford University Press 2015-12-15 2015-09-03 /pmc/articles/PMC4673975/ /pubmed/26338770 http://dx.doi.org/10.1093/bioinformatics/btv488 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Hayer, Katharina E.
Pizarro, Angel
Lahens, Nicholas F.
Hogenesch, John B.
Grant, Gregory R.
Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
title Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
title_full Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
title_fullStr Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
title_full_unstemmed Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
title_short Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data
title_sort benchmark analysis of algorithms for determining and quantifying full-length mrna splice forms from rna-seq data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4673975/
https://www.ncbi.nlm.nih.gov/pubmed/26338770
http://dx.doi.org/10.1093/bioinformatics/btv488
work_keys_str_mv AT hayerkatharinae benchmarkanalysisofalgorithmsfordeterminingandquantifyingfulllengthmrnaspliceformsfromrnaseqdata
AT pizarroangel benchmarkanalysisofalgorithmsfordeterminingandquantifyingfulllengthmrnaspliceformsfromrnaseqdata
AT lahensnicholasf benchmarkanalysisofalgorithmsfordeterminingandquantifyingfulllengthmrnaspliceformsfromrnaseqdata
AT hogeneschjohnb benchmarkanalysisofalgorithmsfordeterminingandquantifyingfulllengthmrnaspliceformsfromrnaseqdata
AT grantgregoryr benchmarkanalysisofalgorithmsfordeterminingandquantifyingfulllengthmrnaspliceformsfromrnaseqdata