Cargando…

A comparative study of RNA-seq analysis strategies

Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method i...

Descripción completa

Detalles Bibliográficos
Autores principales: Jänes, Jürgen, Hu, Fengyuan, Lewin, Alexandra, Turro, Ernest
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652615/
https://www.ncbi.nlm.nih.gov/pubmed/25788326
http://dx.doi.org/10.1093/bib/bbv007
_version_ 1782401786891468800
author Jänes, Jürgen
Hu, Fengyuan
Lewin, Alexandra
Turro, Ernest
author_facet Jänes, Jürgen
Hu, Fengyuan
Lewin, Alexandra
Turro, Ernest
author_sort Jänes, Jürgen
collection PubMed
description Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online (https://github.com/boboppie/RSSS) and can be expanded by interested parties to include methods other than the exemplars presented in this article.
format Online
Article
Text
id pubmed-4652615
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46526152015-11-25 A comparative study of RNA-seq analysis strategies Jänes, Jürgen Hu, Fengyuan Lewin, Alexandra Turro, Ernest Brief Bioinform Papers Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online (https://github.com/boboppie/RSSS) and can be expanded by interested parties to include methods other than the exemplars presented in this article. Oxford University Press 2015-11 2015-03-18 /pmc/articles/PMC4652615/ /pubmed/25788326 http://dx.doi.org/10.1093/bib/bbv007 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Papers
Jänes, Jürgen
Hu, Fengyuan
Lewin, Alexandra
Turro, Ernest
A comparative study of RNA-seq analysis strategies
title A comparative study of RNA-seq analysis strategies
title_full A comparative study of RNA-seq analysis strategies
title_fullStr A comparative study of RNA-seq analysis strategies
title_full_unstemmed A comparative study of RNA-seq analysis strategies
title_short A comparative study of RNA-seq analysis strategies
title_sort comparative study of rna-seq analysis strategies
topic Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652615/
https://www.ncbi.nlm.nih.gov/pubmed/25788326
http://dx.doi.org/10.1093/bib/bbv007
work_keys_str_mv AT janesjurgen acomparativestudyofrnaseqanalysisstrategies
AT hufengyuan acomparativestudyofrnaseqanalysisstrategies
AT lewinalexandra acomparativestudyofrnaseqanalysisstrategies
AT turroernest acomparativestudyofrnaseqanalysisstrategies
AT janesjurgen comparativestudyofrnaseqanalysisstrategies
AT hufengyuan comparativestudyofrnaseqanalysisstrategies
AT lewinalexandra comparativestudyofrnaseqanalysisstrategies
AT turroernest comparativestudyofrnaseqanalysisstrategies