Cargando…

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local align...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Chunlin, Lefkowitz, Elliot J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC545957/
https://www.ncbi.nlm.nih.gov/pubmed/15511296
http://dx.doi.org/10.1186/1471-2105-5-171
_version_ 1782122224481730560
author Wang, Chunlin
Lefkowitz, Elliot J
author_facet Wang, Chunlin
Lefkowitz, Elliot J
author_sort Wang, Chunlin
collection PubMed
description BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. RESULTS: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. CONCLUSIONS: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist.
format Text
id pubmed-545957
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5459572005-01-28 SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters Wang, Chunlin Lefkowitz, Elliot J BMC Bioinformatics Software BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. RESULTS: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. CONCLUSIONS: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. BioMed Central 2004-10-28 /pmc/articles/PMC545957/ /pubmed/15511296 http://dx.doi.org/10.1186/1471-2105-5-171 Text en Copyright © 2004 Wang and Lefkowitz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Wang, Chunlin
Lefkowitz, Elliot J
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
title SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
title_full SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
title_fullStr SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
title_full_unstemmed SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
title_short SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
title_sort ss-wrapper: a package of wrapper applications for similarity searches on linux clusters
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC545957/
https://www.ncbi.nlm.nih.gov/pubmed/15511296
http://dx.doi.org/10.1186/1471-2105-5-171
work_keys_str_mv AT wangchunlin sswrapperapackageofwrapperapplicationsforsimilaritysearchesonlinuxclusters
AT lefkowitzelliotj sswrapperapackageofwrapperapplicationsforsimilaritysearchesonlinuxclusters