Cargando…

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores

BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RE...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Jintao, Wang, Bingqiang, Wei, Yanjie, Feng, Shengzhong, Balaji, Pavan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168705/
https://www.ncbi.nlm.nih.gov/pubmed/25253533
http://dx.doi.org/10.1186/1471-2105-15-S9-S2
_version_ 1782335602738331648
author Meng, Jintao
Wang, Bingqiang
Wei, Yanjie
Feng, Shengzhong
Balaji, Pavan
author_facet Meng, Jintao
Wang, Bingqiang
Wei, Yanjie
Feng, Shengzhong
Balaji, Pavan
author_sort Meng, Jintao
collection PubMed
description BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler
format Online
Article
Text
id pubmed-4168705
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41687052014-10-02 SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan BMC Bioinformatics Proceedings BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler BioMed Central 2014-09-10 /pmc/articles/PMC4168705/ /pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2 Text en Copyright © 2014 Meng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Meng, Jintao
Wang, Bingqiang
Wei, Yanjie
Feng, Shengzhong
Balaji, Pavan
SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_full SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_fullStr SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_full_unstemmed SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_short SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_sort swap-assembler: scalable and efficient genome assembly towards thousands of cores
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168705/
https://www.ncbi.nlm.nih.gov/pubmed/25253533
http://dx.doi.org/10.1186/1471-2105-15-S9-S2
work_keys_str_mv AT mengjintao swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores
AT wangbingqiang swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores
AT weiyanjie swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores
AT fengshengzhong swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores
AT balajipavan swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores