Cargando…

Protein structural similarity search by Ramachandran codes

BACKGROUND: Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that...

Descripción completa

Detalles Bibliográficos
Autores principales: Lo, Wei-Cheng, Huang, Po-Jung, Chang, Chih-Hung, Lyu, Ping-Chiang
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194796/
https://www.ncbi.nlm.nih.gov/pubmed/17716377
http://dx.doi.org/10.1186/1471-2105-8-307
_version_ 1782147697520672768
author Lo, Wei-Cheng
Huang, Po-Jung
Chang, Chih-Hung
Lyu, Ping-Chiang
author_facet Lo, Wei-Cheng
Huang, Po-Jung
Chang, Chih-Hung
Lyu, Ping-Chiang
author_sort Lo, Wei-Cheng
collection PubMed
description BACKGROUND: Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. RESULTS: We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. CONCLUSION: As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.
format Text
id pubmed-2194796
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21947962008-01-14 Protein structural similarity search by Ramachandran codes Lo, Wei-Cheng Huang, Po-Jung Chang, Chih-Hung Lyu, Ping-Chiang BMC Bioinformatics Methodology Article BACKGROUND: Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. RESULTS: We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. CONCLUSION: As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. BioMed Central 2007-08-23 /pmc/articles/PMC2194796/ /pubmed/17716377 http://dx.doi.org/10.1186/1471-2105-8-307 Text en Copyright © 2007 Lo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lo, Wei-Cheng
Huang, Po-Jung
Chang, Chih-Hung
Lyu, Ping-Chiang
Protein structural similarity search by Ramachandran codes
title Protein structural similarity search by Ramachandran codes
title_full Protein structural similarity search by Ramachandran codes
title_fullStr Protein structural similarity search by Ramachandran codes
title_full_unstemmed Protein structural similarity search by Ramachandran codes
title_short Protein structural similarity search by Ramachandran codes
title_sort protein structural similarity search by ramachandran codes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194796/
https://www.ncbi.nlm.nih.gov/pubmed/17716377
http://dx.doi.org/10.1186/1471-2105-8-307
work_keys_str_mv AT loweicheng proteinstructuralsimilaritysearchbyramachandrancodes
AT huangpojung proteinstructuralsimilaritysearchbyramachandrancodes
AT changchihhung proteinstructuralsimilaritysearchbyramachandrancodes
AT lyupingchiang proteinstructuralsimilaritysearchbyramachandrancodes