Cargando…

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

BACKGROUND: De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in seve...

Descripción completa

Detalles Bibliográficos
Autores principales: Cabau, Cédric, Escudié, Frédéric, Djari, Anis, Guiguen, Yann, Bobe, Julien, Klopp, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5316280/
https://www.ncbi.nlm.nih.gov/pubmed/28224052
http://dx.doi.org/10.7717/peerj.2988
_version_ 1782508822225485824
author Cabau, Cédric
Escudié, Frédéric
Djari, Anis
Guiguen, Yann
Bobe, Julien
Klopp, Christophe
author_facet Cabau, Cédric
Escudié, Frédéric
Djari, Anis
Guiguen, Yann
Bobe, Julien
Klopp, Christophe
author_sort Cabau, Cédric
collection PubMed
description BACKGROUND: De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. RESULTS: We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1.3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. CONCLUSION: Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an easy to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available under GPL V3 license at http://www.sigenae.org/drap.
format Online
Article
Text
id pubmed-5316280
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-53162802017-02-21 Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies Cabau, Cédric Escudié, Frédéric Djari, Anis Guiguen, Yann Bobe, Julien Klopp, Christophe PeerJ Bioinformatics BACKGROUND: De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. RESULTS: We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1.3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. CONCLUSION: Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an easy to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available under GPL V3 license at http://www.sigenae.org/drap. PeerJ Inc. 2017-02-16 /pmc/articles/PMC5316280/ /pubmed/28224052 http://dx.doi.org/10.7717/peerj.2988 Text en ©2017 Cabau et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Cabau, Cédric
Escudié, Frédéric
Djari, Anis
Guiguen, Yann
Bobe, Julien
Klopp, Christophe
Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_full Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_fullStr Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_full_unstemmed Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_short Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
title_sort compacting and correcting trinity and oases rna-seq de novo assemblies
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5316280/
https://www.ncbi.nlm.nih.gov/pubmed/28224052
http://dx.doi.org/10.7717/peerj.2988
work_keys_str_mv AT cabaucedric compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT escudiefrederic compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT djarianis compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT guiguenyann compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT bobejulien compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies
AT kloppchristophe compactingandcorrectingtrinityandoasesrnaseqdenovoassemblies