Cargando…

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores

BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RE...

Descripción completa

Detalles Bibliográficos
Autores principales:	Meng, Jintao, Wang, Bingqiang, Wei, Yanjie, Feng, Shengzhong, Balaji, Pavan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168705/ https://www.ncbi.nlm.nih.gov/pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2

_version_	1782335602738331648
author	Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan
author_facet	Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan
author_sort	Meng, Jintao
collection	PubMed
description	BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler
format	Online Article Text
id	pubmed-4168705
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-41687052014-10-02 SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan BMC Bioinformatics Proceedings BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler BioMed Central 2014-09-10 /pmc/articles/PMC4168705/ /pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2 Text en Copyright © 2014 Meng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title	SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_full	SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_fullStr	SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_full_unstemmed	SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_short	SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
title_sort	swap-assembler: scalable and efficient genome assembly towards thousands of cores
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168705/ https://www.ncbi.nlm.nih.gov/pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2
work_keys_str_mv	AT mengjintao swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT wangbingqiang swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT weiyanjie swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT fengshengzhong swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT balajipavan swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores

Ejemplares similares