Cargando…

Identifying structural domains of proteins using clustering

BACKGROUND: Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior t...

Descripción completa

Detalles Bibliográficos
Autor principal:	Feldman, Howard J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534501/ https://www.ncbi.nlm.nih.gov/pubmed/23116496 http://dx.doi.org/10.1186/1471-2105-13-286

_version_	1782475342480408576
author	Feldman, Howard J
author_facet	Feldman, Howard J
author_sort	Feldman, Howard J
collection	PubMed
description	BACKGROUND: Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior to similarity searching for example. Numerous computational methods exist for doing so, but most operate only on a single protein chain and many are limited to making a series of cuts to the sequence, while domains can and do span multiple chains. RESULTS: This study presents a novel clustering-based approach to domain identification, which works equally well on individual chains or entire complexes. The method is simple and fast, taking only a few milliseconds to run, and works by clustering either vectors representing secondary structure elements, or buried alpha-carbon positions, using average-linkage clustering. Each resulting cluster corresponds to a domain of the structure. The method is competitive with others, achieving 70% agreement with SCOP on a large non-redundant data set, and 80% on a set more heavily weighted in multi-domain proteins on which both SCOP and CATH agree. CONCLUSIONS: It is encouraging that a basic method such as this performs nearly as well or better than some far more complex approaches. This suggests that protein domains are indeed for the most part simply compact regions of structure with a higher density of buried contacts within themselves than between each other. By representing the structure as a set of points or vectors in space, it allows us to break free of any artificial limitations that other approaches may depend upon.
format	Online Article Text
id	pubmed-3534501
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35345012013-01-03 Identifying structural domains of proteins using clustering Feldman, Howard J BMC Bioinformatics Methodology Article BACKGROUND: Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior to similarity searching for example. Numerous computational methods exist for doing so, but most operate only on a single protein chain and many are limited to making a series of cuts to the sequence, while domains can and do span multiple chains. RESULTS: This study presents a novel clustering-based approach to domain identification, which works equally well on individual chains or entire complexes. The method is simple and fast, taking only a few milliseconds to run, and works by clustering either vectors representing secondary structure elements, or buried alpha-carbon positions, using average-linkage clustering. Each resulting cluster corresponds to a domain of the structure. The method is competitive with others, achieving 70% agreement with SCOP on a large non-redundant data set, and 80% on a set more heavily weighted in multi-domain proteins on which both SCOP and CATH agree. CONCLUSIONS: It is encouraging that a basic method such as this performs nearly as well or better than some far more complex approaches. This suggests that protein domains are indeed for the most part simply compact regions of structure with a higher density of buried contacts within themselves than between each other. By representing the structure as a set of points or vectors in space, it allows us to break free of any artificial limitations that other approaches may depend upon. BioMed Central 2012-11-01 /pmc/articles/PMC3534501/ /pubmed/23116496 http://dx.doi.org/10.1186/1471-2105-13-286 Text en Copyright ©2012 Feldman; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Feldman, Howard J Identifying structural domains of proteins using clustering
title	Identifying structural domains of proteins using clustering
title_full	Identifying structural domains of proteins using clustering
title_fullStr	Identifying structural domains of proteins using clustering
title_full_unstemmed	Identifying structural domains of proteins using clustering
title_short	Identifying structural domains of proteins using clustering
title_sort	identifying structural domains of proteins using clustering
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534501/ https://www.ncbi.nlm.nih.gov/pubmed/23116496 http://dx.doi.org/10.1186/1471-2105-13-286
work_keys_str_mv	AT feldmanhowardj identifyingstructuraldomainsofproteinsusingclustering

Identifying structural domains of proteins using clustering

Ejemplares similares