A topological algorithm for identification of structural domains of proteins

BACKGROUND: Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains deve...

Descripción completa

Detalles Bibliográficos
Autores principales:	Emmert-Streib, Frank, Mushegian, Arcady
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1933582/ https://www.ncbi.nlm.nih.gov/pubmed/17608939 http://dx.doi.org/10.1186/1471-2105-8-237

_version_	1782134336496074752
author	Emmert-Streib, Frank Mushegian, Arcady
author_facet	Emmert-Streib, Frank Mushegian, Arcady
author_sort	Emmert-Streib, Frank
collection	PubMed
description	BACKGROUND: Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure. RESULTS: We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy. CONCLUSION: Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements.
format	Text
id	pubmed-1933582
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-19335822007-07-27 A topological algorithm for identification of structural domains of proteins Emmert-Streib, Frank Mushegian, Arcady BMC Bioinformatics Research Article BACKGROUND: Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure. RESULTS: We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy. CONCLUSION: Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements. BioMed Central 2007-07-03 /pmc/articles/PMC1933582/ /pubmed/17608939 http://dx.doi.org/10.1186/1471-2105-8-237 Text en Copyright © 2007 Emmert-Streib and Mushegian; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Emmert-Streib, Frank Mushegian, Arcady A topological algorithm for identification of structural domains of proteins
title	A topological algorithm for identification of structural domains of proteins
title_full	A topological algorithm for identification of structural domains of proteins
title_fullStr	A topological algorithm for identification of structural domains of proteins
title_full_unstemmed	A topological algorithm for identification of structural domains of proteins
title_short	A topological algorithm for identification of structural domains of proteins
title_sort	topological algorithm for identification of structural domains of proteins
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1933582/ https://www.ncbi.nlm.nih.gov/pubmed/17608939 http://dx.doi.org/10.1186/1471-2105-8-237
work_keys_str_mv	AT emmertstreibfrank atopologicalalgorithmforidentificationofstructuraldomainsofproteins AT mushegianarcady atopologicalalgorithmforidentificationofstructuraldomainsofproteins AT emmertstreibfrank topologicalalgorithmforidentificationofstructuraldomainsofproteins AT mushegianarcady topologicalalgorithmforidentificationofstructuraldomainsofproteins

A topological algorithm for identification of structural domains of proteins

Ejemplares similares