Cargando…
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local align...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC545957/ https://www.ncbi.nlm.nih.gov/pubmed/15511296 http://dx.doi.org/10.1186/1471-2105-5-171 |
_version_ | 1782122224481730560 |
---|---|
author | Wang, Chunlin Lefkowitz, Elliot J |
author_facet | Wang, Chunlin Lefkowitz, Elliot J |
author_sort | Wang, Chunlin |
collection | PubMed |
description | BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. RESULTS: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. CONCLUSIONS: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. |
format | Text |
id | pubmed-545957 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-5459572005-01-28 SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters Wang, Chunlin Lefkowitz, Elliot J BMC Bioinformatics Software BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. RESULTS: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. CONCLUSIONS: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. BioMed Central 2004-10-28 /pmc/articles/PMC545957/ /pubmed/15511296 http://dx.doi.org/10.1186/1471-2105-5-171 Text en Copyright © 2004 Wang and Lefkowitz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Wang, Chunlin Lefkowitz, Elliot J SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters |
title | SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters |
title_full | SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters |
title_fullStr | SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters |
title_full_unstemmed | SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters |
title_short | SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters |
title_sort | ss-wrapper: a package of wrapper applications for similarity searches on linux clusters |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC545957/ https://www.ncbi.nlm.nih.gov/pubmed/15511296 http://dx.doi.org/10.1186/1471-2105-5-171 |
work_keys_str_mv | AT wangchunlin sswrapperapackageofwrapperapplicationsforsimilaritysearchesonlinuxclusters AT lefkowitzelliotj sswrapperapackageofwrapperapplicationsforsimilaritysearchesonlinuxclusters |