Cargando…

An algorithm of discovering signatures from DNA databases on a computer cluster

BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of data...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Hsiao Ping, Sheu, Tzu-Fang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286918/
https://www.ncbi.nlm.nih.gov/pubmed/25282047
http://dx.doi.org/10.1186/1471-2105-15-339
_version_ 1782351728927047680
author Lee, Hsiao Ping
Sheu, Tzu-Fang
author_facet Lee, Hsiao Ping
Sheu, Tzu-Fang
author_sort Lee, Hsiao Ping
collection PubMed
description BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. RESULTS: In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. CONCLUSIONS: The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
format Online
Article
Text
id pubmed-4286918
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42869182015-01-09 An algorithm of discovering signatures from DNA databases on a computer cluster Lee, Hsiao Ping Sheu, Tzu-Fang BMC Bioinformatics Methodology Article BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. RESULTS: In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. CONCLUSIONS: The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm. BioMed Central 2014-10-05 /pmc/articles/PMC4286918/ /pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339 Text en © Lee and Sheu; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Lee, Hsiao Ping
Sheu, Tzu-Fang
An algorithm of discovering signatures from DNA databases on a computer cluster
title An algorithm of discovering signatures from DNA databases on a computer cluster
title_full An algorithm of discovering signatures from DNA databases on a computer cluster
title_fullStr An algorithm of discovering signatures from DNA databases on a computer cluster
title_full_unstemmed An algorithm of discovering signatures from DNA databases on a computer cluster
title_short An algorithm of discovering signatures from DNA databases on a computer cluster
title_sort algorithm of discovering signatures from dna databases on a computer cluster
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286918/
https://www.ncbi.nlm.nih.gov/pubmed/25282047
http://dx.doi.org/10.1186/1471-2105-15-339
work_keys_str_mv AT leehsiaoping analgorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster
AT sheutzufang analgorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster
AT leehsiaoping algorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster
AT sheutzufang algorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster