Cargando…
A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms
BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731495/ https://www.ncbi.nlm.nih.gov/pubmed/29244008 http://dx.doi.org/10.1186/s12864-017-4270-9 |
_version_ | 1783286521116229632 |
---|---|
author | Sze, Sing-Hoi Parrott, Jonathan J. Tarone, Aaron M. |
author_facet | Sze, Sing-Hoi Parrott, Jonathan J. Tarone, Aaron M. |
author_sort | Sze, Sing-Hoi |
collection | PubMed |
description | BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies. |
format | Online Article Text |
id | pubmed-5731495 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57314952017-12-19 A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms Sze, Sing-Hoi Parrott, Jonathan J. Tarone, Aaron M. BMC Genomics Research BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies. BioMed Central 2017-12-06 /pmc/articles/PMC5731495/ /pubmed/29244008 http://dx.doi.org/10.1186/s12864-017-4270-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Sze, Sing-Hoi Parrott, Jonathan J. Tarone, Aaron M. A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
title | A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
title_full | A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
title_fullStr | A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
title_full_unstemmed | A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
title_short | A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
title_sort | divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731495/ https://www.ncbi.nlm.nih.gov/pubmed/29244008 http://dx.doi.org/10.1186/s12864-017-4270-9 |
work_keys_str_mv | AT szesinghoi adivideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms AT parrottjonathanj adivideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms AT taroneaaronm adivideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms AT szesinghoi divideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms AT parrottjonathanj divideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms AT taroneaaronm divideandconqueralgorithmforlargescaledenovotranscriptomeassemblythroughcombiningsmallassembliesfromexistingalgorithms |