Cargando…

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

BACKGROUND: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is...

Descripción completa

Detalles Bibliográficos
Autores principales: Bushmanova, Elena, Antipov, Dmitry, Lapidus, Alla, Prjibelski, Andrey D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736328/
https://www.ncbi.nlm.nih.gov/pubmed/31494669
http://dx.doi.org/10.1093/gigascience/giz100
_version_ 1783450496740098048
author Bushmanova, Elena
Antipov, Dmitry
Lapidus, Alla
Prjibelski, Andrey D
author_facet Bushmanova, Elena
Antipov, Dmitry
Lapidus, Alla
Prjibelski, Andrey D
author_sort Bushmanova, Elena
collection PubMed
description BACKGROUND: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. RESULTS: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. CONCLUSIONS: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.
format Online
Article
Text
id pubmed-6736328
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67363282019-09-16 rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data Bushmanova, Elena Antipov, Dmitry Lapidus, Alla Prjibelski, Andrey D Gigascience Technical Note BACKGROUND: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. RESULTS: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. CONCLUSIONS: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors. Oxford University Press 2019-09-03 /pmc/articles/PMC6736328/ /pubmed/31494669 http://dx.doi.org/10.1093/gigascience/giz100 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Bushmanova, Elena
Antipov, Dmitry
Lapidus, Alla
Prjibelski, Andrey D
rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
title rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
title_full rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
title_fullStr rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
title_full_unstemmed rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
title_short rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data
title_sort rnaspades: a de novo transcriptome assembler and its application to rna-seq data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736328/
https://www.ncbi.nlm.nih.gov/pubmed/31494669
http://dx.doi.org/10.1093/gigascience/giz100
work_keys_str_mv AT bushmanovaelena rnaspadesadenovotranscriptomeassembleranditsapplicationtornaseqdata
AT antipovdmitry rnaspadesadenovotranscriptomeassembleranditsapplicationtornaseqdata
AT lapidusalla rnaspadesadenovotranscriptomeassembleranditsapplicationtornaseqdata
AT prjibelskiandreyd rnaspadesadenovotranscriptomeassembleranditsapplicationtornaseqdata