Cargando…

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

BACKGROUND: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Xi, Wang, Chen, Tang, Shanjiang, Yu, Ce, Zou, Quan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483318/ https://www.ncbi.nlm.nih.gov/pubmed/28646874 http://dx.doi.org/10.1186/s12859-017-1725-6

_version_	1783245738700963840
author	Chen, Xi Wang, Chen Tang, Shanjiang Yu, Ce Zou, Quan
author_facet	Chen, Xi Wang, Chen Tang, Shanjiang Yu, Ce Zou, Quan
author_sort	Chen, Xi
collection	PubMed
description	BACKGROUND: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users’ sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. RESULTS: This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users’ submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn (2)) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CONCLUSION: CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA.
format	Online Article Text
id	pubmed-5483318
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54833182017-06-26 CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment Chen, Xi Wang, Chen Tang, Shanjiang Yu, Ce Zou, Quan BMC Bioinformatics Software BACKGROUND: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users’ sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. RESULTS: This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users’ submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn (2)) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CONCLUSION: CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA. BioMed Central 2017-06-24 /pmc/articles/PMC5483318/ /pubmed/28646874 http://dx.doi.org/10.1186/s12859-017-1725-6 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Chen, Xi Wang, Chen Tang, Shanjiang Yu, Ce Zou, Quan CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
title	CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
title_full	CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
title_fullStr	CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
title_full_unstemmed	CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
title_short	CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
title_sort	cmsa: a heterogeneous cpu/gpu computing system for multiple similar rna/dna sequence alignment
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483318/ https://www.ncbi.nlm.nih.gov/pubmed/28646874 http://dx.doi.org/10.1186/s12859-017-1725-6
work_keys_str_mv	AT chenxi cmsaaheterogeneouscpugpucomputingsystemformultiplesimilarrnadnasequencealignment AT wangchen cmsaaheterogeneouscpugpucomputingsystemformultiplesimilarrnadnasequencealignment AT tangshanjiang cmsaaheterogeneouscpugpucomputingsystemformultiplesimilarrnadnasequencealignment AT yuce cmsaaheterogeneouscpugpucomputingsystemformultiplesimilarrnadnasequencealignment AT zouquan cmsaaheterogeneouscpugpucomputingsystemformultiplesimilarrnadnasequencealignment

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

Ejemplares similares