Cargando…

A fast algorithm for exact sequence search in biological sequences using polyphase decomposition

Motivation: Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the am...

Descripción completa

Detalles Bibliográficos
Autores principales: Srikantha, Abhilash, Bopardikar, Ajit S., Kaipa, Kalyan Kumar, Venkataraman, Parthasarathy, Lee, Kyusang, Ahn, TaeJin, Narayanan, Rangavittal
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935425/
https://www.ncbi.nlm.nih.gov/pubmed/20823301
http://dx.doi.org/10.1093/bioinformatics/btq364
_version_ 1782186400050839552
author Srikantha, Abhilash
Bopardikar, Ajit S.
Kaipa, Kalyan Kumar
Venkataraman, Parthasarathy
Lee, Kyusang
Ahn, TaeJin
Narayanan, Rangavittal
author_facet Srikantha, Abhilash
Bopardikar, Ajit S.
Kaipa, Kalyan Kumar
Venkataraman, Parthasarathy
Lee, Kyusang
Ahn, TaeJin
Narayanan, Rangavittal
author_sort Srikantha, Abhilash
collection PubMed
description Motivation: Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the amount of sequence data that must be processed will also increase exponentially. In this context, fast sequence search algorithms will play an important role in exploiting the information contained in the newly sequenced data. Many existing algorithms do not scale up well for large sequences or databases because of their high-computational costs. This article describes an efficient algorithm for performing fast searches on large DNA sequences. It makes use of hash tables of Q-grams that are constructed after downsampling the database, to enable efficient search and memory use. Time complexity for pattern search is reduced using beam pruning techniques. Theoretical complexity calculations and performance figures are presented to indicate the potential of the proposed algorithm. Contact: s.abhilash@samsung.com; ajit.b@samsung.com
format Text
id pubmed-2935425
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29354252010-09-08 A fast algorithm for exact sequence search in biological sequences using polyphase decomposition Srikantha, Abhilash Bopardikar, Ajit S. Kaipa, Kalyan Kumar Venkataraman, Parthasarathy Lee, Kyusang Ahn, TaeJin Narayanan, Rangavittal Bioinformatics Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Motivation: Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the amount of sequence data that must be processed will also increase exponentially. In this context, fast sequence search algorithms will play an important role in exploiting the information contained in the newly sequenced data. Many existing algorithms do not scale up well for large sequences or databases because of their high-computational costs. This article describes an efficient algorithm for performing fast searches on large DNA sequences. It makes use of hash tables of Q-grams that are constructed after downsampling the database, to enable efficient search and memory use. Time complexity for pattern search is reduced using beam pruning techniques. Theoretical complexity calculations and performance figures are presented to indicate the potential of the proposed algorithm. Contact: s.abhilash@samsung.com; ajit.b@samsung.com Oxford University Press 2010-09-15 2010-09-04 /pmc/articles/PMC2935425/ /pubmed/20823301 http://dx.doi.org/10.1093/bioinformatics/btq364 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
Srikantha, Abhilash
Bopardikar, Ajit S.
Kaipa, Kalyan Kumar
Venkataraman, Parthasarathy
Lee, Kyusang
Ahn, TaeJin
Narayanan, Rangavittal
A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
title A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
title_full A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
title_fullStr A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
title_full_unstemmed A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
title_short A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
title_sort fast algorithm for exact sequence search in biological sequences using polyphase decomposition
topic Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935425/
https://www.ncbi.nlm.nih.gov/pubmed/20823301
http://dx.doi.org/10.1093/bioinformatics/btq364
work_keys_str_mv AT srikanthaabhilash afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT bopardikarajits afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT kaipakalyankumar afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT venkataramanparthasarathy afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT leekyusang afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT ahntaejin afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT narayananrangavittal afastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT srikanthaabhilash fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT bopardikarajits fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT kaipakalyankumar fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT venkataramanparthasarathy fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT leekyusang fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT ahntaejin fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition
AT narayananrangavittal fastalgorithmforexactsequencesearchinbiologicalsequencesusingpolyphasedecomposition