Cargando…

Saturating representation of loop conformational fragments in structure databanks

BACKGROUND: Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availabili...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fernandez-Fuentes, Narcis, Fiser, András
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1574324/ https://www.ncbi.nlm.nih.gov/pubmed/16820050 http://dx.doi.org/10.1186/1472-6807-6-15

_version_	1782130286328283136
author	Fernandez-Fuentes, Narcis Fiser, András
author_facet	Fernandez-Fuentes, Narcis Fiser, András
author_sort	Fernandez-Fuentes, Narcis
collection	PubMed
description	BACKGROUND: Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks. RESULTS: We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments. CONCLUSION: The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space.
format	Text
id	pubmed-1574324
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15743242006-09-23 Saturating representation of loop conformational fragments in structure databanks Fernandez-Fuentes, Narcis Fiser, András BMC Struct Biol Research Article BACKGROUND: Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks. RESULTS: We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments. CONCLUSION: The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space. BioMed Central 2006-07-04 /pmc/articles/PMC1574324/ /pubmed/16820050 http://dx.doi.org/10.1186/1472-6807-6-15 Text en Copyright © 2006 Fernandez-Fuentes and Fiser; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Fernandez-Fuentes, Narcis Fiser, András Saturating representation of loop conformational fragments in structure databanks
title	Saturating representation of loop conformational fragments in structure databanks
title_full	Saturating representation of loop conformational fragments in structure databanks
title_fullStr	Saturating representation of loop conformational fragments in structure databanks
title_full_unstemmed	Saturating representation of loop conformational fragments in structure databanks
title_short	Saturating representation of loop conformational fragments in structure databanks
title_sort	saturating representation of loop conformational fragments in structure databanks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1574324/ https://www.ncbi.nlm.nih.gov/pubmed/16820050 http://dx.doi.org/10.1186/1472-6807-6-15
work_keys_str_mv	AT fernandezfuentesnarcis saturatingrepresentationofloopconformationalfragmentsinstructuredatabanks AT fiserandras saturatingrepresentationofloopconformationalfragmentsinstructuredatabanks

Saturating representation of loop conformational fragments in structure databanks

Ejemplares similares