Cargando…
A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461550/ https://www.ncbi.nlm.nih.gov/pubmed/28589866 http://dx.doi.org/10.1186/s12864-017-3735-1 |
_version_ | 1783242353963696128 |
---|---|
author | Sze, Sing-Hoi Pimsler, Meaghan L. Tomberlin, Jeffery K. Jones, Corbin D. Tarone, Aaron M. |
author_facet | Sze, Sing-Hoi Pimsler, Meaghan L. Tomberlin, Jeffery K. Jones, Corbin D. Tarone, Aaron M. |
author_sort | Sze, Sing-Hoi |
collection | PubMed |
description | BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. RESULTS: We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. CONCLUSIONS: Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available. |
format | Online Article Text |
id | pubmed-5461550 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54615502017-06-07 A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms Sze, Sing-Hoi Pimsler, Meaghan L. Tomberlin, Jeffery K. Jones, Corbin D. Tarone, Aaron M. BMC Genomics Research BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. RESULTS: We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. CONCLUSIONS: Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available. BioMed Central 2017-05-24 /pmc/articles/PMC5461550/ /pubmed/28589866 http://dx.doi.org/10.1186/s12864-017-3735-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Sze, Sing-Hoi Pimsler, Meaghan L. Tomberlin, Jeffery K. Jones, Corbin D. Tarone, Aaron M. A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
title | A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
title_full | A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
title_fullStr | A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
title_full_unstemmed | A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
title_short | A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
title_sort | scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461550/ https://www.ncbi.nlm.nih.gov/pubmed/28589866 http://dx.doi.org/10.1186/s12864-017-3735-1 |
work_keys_str_mv | AT szesinghoi ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT pimslermeaghanl ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT tomberlinjefferyk ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT jonescorbind ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT taroneaaronm ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT szesinghoi scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT pimslermeaghanl scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT tomberlinjefferyk scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT jonescorbind scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms AT taroneaaronm scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms |