Cargando…

Towards computational improvement of DNA database indexing and short DNA query searching

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stojanov, Done, Koceski, Sašo, Mileva, Aleksandra, Koceska, Nataša, Bande, Cveta Martinovska
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Taylor & Francis 2014
Materias:	Article; Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4434100/ https://www.ncbi.nlm.nih.gov/pubmed/26019584 http://dx.doi.org/10.1080/13102818.2014.959711

_version_	1782371723872567296
author	Stojanov, Done Koceski, Sašo Mileva, Aleksandra Koceska, Nataša Bande, Cveta Martinovska
author_facet	Stojanov, Done Koceski, Sašo Mileva, Aleksandra Koceska, Nataša Bande, Cveta Martinovska
author_sort	Stojanov, Done
collection	PubMed
description	In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Image: see text] are not reported, if the database is searched against a query shorter than [Image: see text] nucleotides, such that [Image: see text] is the length of the DNA database words being mapped and [Image: see text] is the length of the query. A solution of this drawback is also presented.
format	Online Article Text
id	pubmed-4434100
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Taylor & Francis
record_format	MEDLINE/PubMed
spelling	pubmed-44341002015-05-25 Towards computational improvement of DNA database indexing and short DNA query searching Stojanov, Done Koceski, Sašo Mileva, Aleksandra Koceska, Nataša Bande, Cveta Martinovska Biotechnol Biotechnol Equip Article; Bioinformatics In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Image: see text] are not reported, if the database is searched against a query shorter than [Image: see text] nucleotides, such that [Image: see text] is the length of the DNA database words being mapped and [Image: see text] is the length of the query. A solution of this drawback is also presented. Taylor & Francis 2014-09-03 2014-10-31 /pmc/articles/PMC4434100/ /pubmed/26019584 http://dx.doi.org/10.1080/13102818.2014.959711 Text en © 2014 The Author(s). Published by Taylor & Francis. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0/, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The moral rights of the named author(s) have been asserted.
spellingShingle	Article; Bioinformatics Stojanov, Done Koceski, Sašo Mileva, Aleksandra Koceska, Nataša Bande, Cveta Martinovska Towards computational improvement of DNA database indexing and short DNA query searching
title	Towards computational improvement of DNA database indexing and short DNA query searching
title_full	Towards computational improvement of DNA database indexing and short DNA query searching
title_fullStr	Towards computational improvement of DNA database indexing and short DNA query searching
title_full_unstemmed	Towards computational improvement of DNA database indexing and short DNA query searching
title_short	Towards computational improvement of DNA database indexing and short DNA query searching
title_sort	towards computational improvement of dna database indexing and short dna query searching
topic	Article; Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4434100/ https://www.ncbi.nlm.nih.gov/pubmed/26019584 http://dx.doi.org/10.1080/13102818.2014.959711
work_keys_str_mv	AT stojanovdone towardscomputationalimprovementofdnadatabaseindexingandshortdnaquerysearching AT koceskisaso towardscomputationalimprovementofdnadatabaseindexingandshortdnaquerysearching AT milevaaleksandra towardscomputationalimprovementofdnadatabaseindexingandshortdnaquerysearching AT koceskanatasa towardscomputationalimprovementofdnadatabaseindexingandshortdnaquerysearching AT bandecvetamartinovska towardscomputationalimprovementofdnadatabaseindexingandshortdnaquerysearching

Towards computational improvement of DNA database indexing and short DNA query searching

Ejemplares similares