Cargando…

Linear-time protein 3-D structure searching with insertions and deletions

BACKGROUND: Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databa...

Descripción completa

Detalles Bibliográficos
Autores principales: Shibuya, Tetsuo, Jansson, Jesper, Sadakane, Kunihiko
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2830924/
https://www.ncbi.nlm.nih.gov/pubmed/20047663
http://dx.doi.org/10.1186/1748-7188-5-7
_version_ 1782178181610995712
author Shibuya, Tetsuo
Jansson, Jesper
Sadakane, Kunihiko
author_facet Shibuya, Tetsuo
Jansson, Jesper
Sadakane, Kunihiko
author_sort Shibuya, Tetsuo
collection PubMed
description BACKGROUND: Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. RESULTS: We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (i.e., insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels k is bounded by a constant. Our algorithm solves the above problem for a query of size m and a database of size N in average-case O(N) time, whereas the time complexity of the previously best algorithm was O(Nm(k+1)). CONCLUSIONS: Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.
format Text
id pubmed-2830924
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28309242010-03-03 Linear-time protein 3-D structure searching with insertions and deletions Shibuya, Tetsuo Jansson, Jesper Sadakane, Kunihiko Algorithms Mol Biol Research BACKGROUND: Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. RESULTS: We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (i.e., insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels k is bounded by a constant. Our algorithm solves the above problem for a query of size m and a database of size N in average-case O(N) time, whereas the time complexity of the previously best algorithm was O(Nm(k+1)). CONCLUSIONS: Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant. BioMed Central 2010-01-04 /pmc/articles/PMC2830924/ /pubmed/20047663 http://dx.doi.org/10.1186/1748-7188-5-7 Text en Copyright ©2010 Shibuya et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Shibuya, Tetsuo
Jansson, Jesper
Sadakane, Kunihiko
Linear-time protein 3-D structure searching with insertions and deletions
title Linear-time protein 3-D structure searching with insertions and deletions
title_full Linear-time protein 3-D structure searching with insertions and deletions
title_fullStr Linear-time protein 3-D structure searching with insertions and deletions
title_full_unstemmed Linear-time protein 3-D structure searching with insertions and deletions
title_short Linear-time protein 3-D structure searching with insertions and deletions
title_sort linear-time protein 3-d structure searching with insertions and deletions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2830924/
https://www.ncbi.nlm.nih.gov/pubmed/20047663
http://dx.doi.org/10.1186/1748-7188-5-7
work_keys_str_mv AT shibuyatetsuo lineartimeprotein3dstructuresearchingwithinsertionsanddeletions
AT janssonjesper lineartimeprotein3dstructuresearchingwithinsertionsanddeletions
AT sadakanekunihiko lineartimeprotein3dstructuresearchingwithinsertionsanddeletions