Cargando…

A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms

BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory...

Descripción completa

Detalles Bibliográficos
Autores principales: Sze, Sing-Hoi, Pimsler, Meaghan L., Tomberlin, Jeffery K., Jones, Corbin D., Tarone, Aaron M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461550/
https://www.ncbi.nlm.nih.gov/pubmed/28589866
http://dx.doi.org/10.1186/s12864-017-3735-1
_version_ 1783242353963696128
author Sze, Sing-Hoi
Pimsler, Meaghan L.
Tomberlin, Jeffery K.
Jones, Corbin D.
Tarone, Aaron M.
author_facet Sze, Sing-Hoi
Pimsler, Meaghan L.
Tomberlin, Jeffery K.
Jones, Corbin D.
Tarone, Aaron M.
author_sort Sze, Sing-Hoi
collection PubMed
description BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. RESULTS: We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. CONCLUSIONS: Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available.
format Online
Article
Text
id pubmed-5461550
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54615502017-06-07 A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms Sze, Sing-Hoi Pimsler, Meaghan L. Tomberlin, Jeffery K. Jones, Corbin D. Tarone, Aaron M. BMC Genomics Research BACKGROUND: With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. RESULTS: We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. CONCLUSIONS: Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available. BioMed Central 2017-05-24 /pmc/articles/PMC5461550/ /pubmed/28589866 http://dx.doi.org/10.1186/s12864-017-3735-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Sze, Sing-Hoi
Pimsler, Meaghan L.
Tomberlin, Jeffery K.
Jones, Corbin D.
Tarone, Aaron M.
A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
title A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
title_full A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
title_fullStr A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
title_full_unstemmed A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
title_short A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
title_sort scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461550/
https://www.ncbi.nlm.nih.gov/pubmed/28589866
http://dx.doi.org/10.1186/s12864-017-3735-1
work_keys_str_mv AT szesinghoi ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT pimslermeaghanl ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT tomberlinjefferyk ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT jonescorbind ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT taroneaaronm ascalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT szesinghoi scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT pimslermeaghanl scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT tomberlinjefferyk scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT jonescorbind scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms
AT taroneaaronm scalableandmemoryefficientalgorithmfordenovotranscriptomeassemblyofnonmodelorganisms