Cargando…

A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a...

Descripción completa

Detalles Bibliográficos
Autores principales: Sze, Sing-Hoi, Parrott, Jonathan J., Tarone, Aaron M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731495/
https://www.ncbi.nlm.nih.gov/pubmed/29244008
http://dx.doi.org/10.1186/s12864-017-4270-9
_version_ 1783286521116229632
author Sze, Sing-Hoi
Parrott, Jonathan J.
Tarone, Aaron M.
author_facet Sze, Sing-Hoi
Parrott, Jonathan J.
Tarone, Aaron M.
author_sort Sze, Sing-Hoi
collection PubMed
description BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
format Online
Article
Text
id pubmed-5731495
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57314952017-12-19 A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms Sze, Sing-Hoi Parrott, Jonathan J. Tarone, Aaron M. BMC Genomics Research BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies. BioMed Central 2017-12-06 /pmc/articles/PMC5731495/ /pubmed/29244008 http://dx.doi.org/10.1186/s12864-017-4270-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Sze, Sing-Hoi
Parrott, Jonathan J.
Tarone, Aaron M.
A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
title A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
title_full A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
title_fullStr A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
title_full_unstemmed A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
title_short A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
title_sort divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731495/
https://www.ncbi.nlm.nih.gov/pubmed/29244008
http://dx.doi.org/10.1186/s12864-017-4270-9
work_keys_str_mv AT szesinghoi adivideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms
AT parrottjonathanj adivideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms
AT taroneaaronm adivideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms
AT szesinghoi divideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms
AT parrottjonathanj divideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms
AT taroneaaronm divideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms