Cargando…
An algorithm of discovering signatures from DNA databases on a computer cluster
BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of data...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286918/ https://www.ncbi.nlm.nih.gov/pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339 |
_version_ | 1782351728927047680 |
---|---|
author | Lee, Hsiao Ping Sheu, Tzu-Fang |
author_facet | Lee, Hsiao Ping Sheu, Tzu-Fang |
author_sort | Lee, Hsiao Ping |
collection | PubMed |
description | BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. RESULTS: In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. CONCLUSIONS: The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm. |
format | Online Article Text |
id | pubmed-4286918 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42869182015-01-09 An algorithm of discovering signatures from DNA databases on a computer cluster Lee, Hsiao Ping Sheu, Tzu-Fang BMC Bioinformatics Methodology Article BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. RESULTS: In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. CONCLUSIONS: The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm. BioMed Central 2014-10-05 /pmc/articles/PMC4286918/ /pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339 Text en © Lee and Sheu; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Lee, Hsiao Ping Sheu, Tzu-Fang An algorithm of discovering signatures from DNA databases on a computer cluster |
title | An algorithm of discovering signatures from DNA databases on a computer cluster |
title_full | An algorithm of discovering signatures from DNA databases on a computer cluster |
title_fullStr | An algorithm of discovering signatures from DNA databases on a computer cluster |
title_full_unstemmed | An algorithm of discovering signatures from DNA databases on a computer cluster |
title_short | An algorithm of discovering signatures from DNA databases on a computer cluster |
title_sort | algorithm of discovering signatures from dna databases on a computer cluster |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286918/ https://www.ncbi.nlm.nih.gov/pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339 |
work_keys_str_mv | AT leehsiaoping analgorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster AT sheutzufang analgorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster AT leehsiaoping algorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster AT sheutzufang algorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster |