Cargando…

An algorithm of discovering signatures from DNA databases on a computer cluster

BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of data...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Hsiao Ping, Sheu, Tzu-Fang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286918/ https://www.ncbi.nlm.nih.gov/pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339

_version_	1782351728927047680
author	Lee, Hsiao Ping Sheu, Tzu-Fang
author_facet	Lee, Hsiao Ping Sheu, Tzu-Fang
author_sort	Lee, Hsiao Ping
collection	PubMed
description	BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. RESULTS: In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. CONCLUSIONS: The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
format	Online Article Text
id	pubmed-4286918
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42869182015-01-09 An algorithm of discovering signatures from DNA databases on a computer cluster Lee, Hsiao Ping Sheu, Tzu-Fang BMC Bioinformatics Methodology Article BACKGROUND: Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. RESULTS: In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. CONCLUSIONS: The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm. BioMed Central 2014-10-05 /pmc/articles/PMC4286918/ /pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339 Text en © Lee and Sheu; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Lee, Hsiao Ping Sheu, Tzu-Fang An algorithm of discovering signatures from DNA databases on a computer cluster
title	An algorithm of discovering signatures from DNA databases on a computer cluster
title_full	An algorithm of discovering signatures from DNA databases on a computer cluster
title_fullStr	An algorithm of discovering signatures from DNA databases on a computer cluster
title_full_unstemmed	An algorithm of discovering signatures from DNA databases on a computer cluster
title_short	An algorithm of discovering signatures from DNA databases on a computer cluster
title_sort	algorithm of discovering signatures from dna databases on a computer cluster
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286918/ https://www.ncbi.nlm.nih.gov/pubmed/25282047 http://dx.doi.org/10.1186/1471-2105-15-339
work_keys_str_mv	AT leehsiaoping analgorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster AT sheutzufang analgorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster AT leehsiaoping algorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster AT sheutzufang algorithmofdiscoveringsignaturesfromdnadatabasesonacomputercluster

An algorithm of discovering signatures from DNA databases on a computer cluster

Ejemplares similares