Cargando…

A fast parallel algorithm for finding the longest common sequence of multiple biosequences

BACKGROUND: Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS. RESULTS: A fast parallel algorithm for LCS is present...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Yixin, Wan, Andrew, Liu, Wei
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1780122/ https://www.ncbi.nlm.nih.gov/pubmed/17217522 http://dx.doi.org/10.1186/1471-2105-7-S4-S4

_version_	1782131849542238208
author	Chen, Yixin Wan, Andrew Liu, Wei
author_facet	Chen, Yixin Wan, Andrew Liu, Wei
author_sort	Chen, Yixin
collection	PubMed
description	BACKGROUND: Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS. RESULTS: A fast parallel algorithm for LCS is presented. The algorithm first constructs a novel successor table to obtain all the identical pairs and their levels. It then obtains the LCS by tracing back from the identical character pairs at the last level. Effective pruning techniques are developed to significantly reduce the computational complexity. Experimental results on gene sequences in the tigr database show that our algorithm is optimal and much more efficient than other leading LCS algorithms. CONCLUSION: We have developed one of the fastest parallel LCS algorithms on an MPP parallel computing model. For two sequences X and Y with lengths n and m, respectively, the memory required is max{4(n+1)+4(m+1), L}, where L is the number of identical character pairs. The time complexity is O(L) for sequential execution, and O(\|LCS(X, Y)\|) for parallel execution, where \|LCS(X, Y)\| is the length of the LCS of X and Y. For n sequences X(1), X(2), ..., X(n), the time complexity is O(L) for sequential execution, and O(\|LCS(X(1), X(2), ..., X(n))\|) for parallel execution. Experimental results support our analysis by showing significant improvement of the proposed method over other leading LCS algorithms.
format	Text
id	pubmed-1780122
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-17801222007-01-24 A fast parallel algorithm for finding the longest common sequence of multiple biosequences Chen, Yixin Wan, Andrew Liu, Wei BMC Bioinformatics Research BACKGROUND: Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS. RESULTS: A fast parallel algorithm for LCS is presented. The algorithm first constructs a novel successor table to obtain all the identical pairs and their levels. It then obtains the LCS by tracing back from the identical character pairs at the last level. Effective pruning techniques are developed to significantly reduce the computational complexity. Experimental results on gene sequences in the tigr database show that our algorithm is optimal and much more efficient than other leading LCS algorithms. CONCLUSION: We have developed one of the fastest parallel LCS algorithms on an MPP parallel computing model. For two sequences X and Y with lengths n and m, respectively, the memory required is max{4(n+1)+4(m+1), L}, where L is the number of identical character pairs. The time complexity is O(L) for sequential execution, and O(\|LCS(X, Y)\|) for parallel execution, where \|LCS(X, Y)\| is the length of the LCS of X and Y. For n sequences X(1), X(2), ..., X(n), the time complexity is O(L) for sequential execution, and O(\|LCS(X(1), X(2), ..., X(n))\|) for parallel execution. Experimental results support our analysis by showing significant improvement of the proposed method over other leading LCS algorithms. BioMed Central 2006-12-12 /pmc/articles/PMC1780122/ /pubmed/17217522 http://dx.doi.org/10.1186/1471-2105-7-S4-S4 Text en Copyright © 2006 Chen et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Chen, Yixin Wan, Andrew Liu, Wei A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_full	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_fullStr	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_full_unstemmed	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_short	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_sort	fast parallel algorithm for finding the longest common sequence of multiple biosequences
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1780122/ https://www.ncbi.nlm.nih.gov/pubmed/17217522 http://dx.doi.org/10.1186/1471-2105-7-S4-S4
work_keys_str_mv	AT chenyixin afastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT wanandrew afastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT liuwei afastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT chenyixin fastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT wanandrew fastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT liuwei fastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences

A fast parallel algorithm for finding the longest common sequence of multiple biosequences

Ejemplares similares