Cargando…
SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores
BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RE...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168705/ https://www.ncbi.nlm.nih.gov/pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2 |
_version_ | 1782335602738331648 |
---|---|
author | Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan |
author_facet | Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan |
author_sort | Meng, Jintao |
collection | PubMed |
description | BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler |
format | Online Article Text |
id | pubmed-4168705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41687052014-10-02 SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan BMC Bioinformatics Proceedings BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler BioMed Central 2014-09-10 /pmc/articles/PMC4168705/ /pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2 Text en Copyright © 2014 Meng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Meng, Jintao Wang, Bingqiang Wei, Yanjie Feng, Shengzhong Balaji, Pavan SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores |
title | SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores |
title_full | SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores |
title_fullStr | SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores |
title_full_unstemmed | SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores |
title_short | SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores |
title_sort | swap-assembler: scalable and efficient genome assembly towards thousands of cores |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168705/ https://www.ncbi.nlm.nih.gov/pubmed/25253533 http://dx.doi.org/10.1186/1471-2105-15-S9-S2 |
work_keys_str_mv | AT mengjintao swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT wangbingqiang swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT weiyanjie swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT fengshengzhong swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores AT balajipavan swapassemblerscalableandefficientgenomeassemblytowardsthousandsofcores |