Cargando…

Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it i...

Descripción completa

Detalles Bibliográficos
Autores principales: Kleftogiannis, Dimitrios, Kalnis, Panos, Bajic, Vladimir B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3785575/
https://www.ncbi.nlm.nih.gov/pubmed/24086547
http://dx.doi.org/10.1371/journal.pone.0075505
_version_ 1782477679073689600
author Kleftogiannis, Dimitrios
Kalnis, Panos
Bajic, Vladimir B.
author_facet Kleftogiannis, Dimitrios
Kalnis, Panos
Bajic, Vladimir B.
author_sort Kleftogiannis, Dimitrios
collection PubMed
description A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
format Online
Article
Text
id pubmed-3785575
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37855752013-10-01 Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures Kleftogiannis, Dimitrios Kalnis, Panos Bajic, Vladimir B. PLoS One Research Article A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly. Public Library of Science 2013-09-27 /pmc/articles/PMC3785575/ /pubmed/24086547 http://dx.doi.org/10.1371/journal.pone.0075505 Text en © 2013 Kleftogiannis et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kleftogiannis, Dimitrios
Kalnis, Panos
Bajic, Vladimir B.
Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
title Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
title_full Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
title_fullStr Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
title_full_unstemmed Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
title_short Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
title_sort comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3785575/
https://www.ncbi.nlm.nih.gov/pubmed/24086547
http://dx.doi.org/10.1371/journal.pone.0075505
work_keys_str_mv AT kleftogiannisdimitrios comparingmemoryefficientgenomeassemblersonstandaloneandcloudinfrastructures
AT kalnispanos comparingmemoryefficientgenomeassemblersonstandaloneandcloudinfrastructures
AT bajicvladimirb comparingmemoryefficientgenomeassemblersonstandaloneandcloudinfrastructures