Cargando…
ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment
BACKGROUND: There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computin...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949895/ https://www.ncbi.nlm.nih.gov/pubmed/20849574 http://dx.doi.org/10.1186/1471-2105-11-467 |
_version_ | 1782187606130294784 |
---|---|
author | Kim, Taeho Joo, Hyun |
author_facet | Kim, Taeho Joo, Hyun |
author_sort | Kim, Taeho |
collection | PubMed |
description | BACKGROUND: There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. RESULTS: We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. CONCLUSIONS: ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems. |
format | Text |
id | pubmed-2949895 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29498952010-10-06 ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment Kim, Taeho Joo, Hyun BMC Bioinformatics Software BACKGROUND: There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. RESULTS: We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. CONCLUSIONS: ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems. BioMed Central 2010-09-17 /pmc/articles/PMC2949895/ /pubmed/20849574 http://dx.doi.org/10.1186/1471-2105-11-467 Text en Copyright ©2010 Kim and Joo; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Kim, Taeho Joo, Hyun ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment |
title | ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment |
title_full | ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment |
title_fullStr | ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment |
title_full_unstemmed | ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment |
title_short | ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment |
title_sort | clustalxeed: a gui-based grid computation version for high performance and terabyte size multiple sequence alignment |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949895/ https://www.ncbi.nlm.nih.gov/pubmed/20849574 http://dx.doi.org/10.1186/1471-2105-11-467 |
work_keys_str_mv | AT kimtaeho clustalxeedaguibasedgridcomputationversionforhighperformanceandterabytesizemultiplesequencealignment AT joohyun clustalxeedaguibasedgridcomputationversionforhighperformanceandterabytesizemultiplesequencealignment |