Cargando…

Identifying structural domains of proteins using clustering

BACKGROUND: Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior t...

Descripción completa

Detalles Bibliográficos
Autor principal: Feldman, Howard J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534501/
https://www.ncbi.nlm.nih.gov/pubmed/23116496
http://dx.doi.org/10.1186/1471-2105-13-286
_version_ 1782475342480408576
author Feldman, Howard J
author_facet Feldman, Howard J
author_sort Feldman, Howard J
collection PubMed
description BACKGROUND: Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior to similarity searching for example. Numerous computational methods exist for doing so, but most operate only on a single protein chain and many are limited to making a series of cuts to the sequence, while domains can and do span multiple chains. RESULTS: This study presents a novel clustering-based approach to domain identification, which works equally well on individual chains or entire complexes. The method is simple and fast, taking only a few milliseconds to run, and works by clustering either vectors representing secondary structure elements, or buried alpha-carbon positions, using average-linkage clustering. Each resulting cluster corresponds to a domain of the structure. The method is competitive with others, achieving 70% agreement with SCOP on a large non-redundant data set, and 80% on a set more heavily weighted in multi-domain proteins on which both SCOP and CATH agree. CONCLUSIONS: It is encouraging that a basic method such as this performs nearly as well or better than some far more complex approaches. This suggests that protein domains are indeed for the most part simply compact regions of structure with a higher density of buried contacts within themselves than between each other. By representing the structure as a set of points or vectors in space, it allows us to break free of any artificial limitations that other approaches may depend upon.
format Online
Article
Text
id pubmed-3534501
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35345012013-01-03 Identifying structural domains of proteins using clustering Feldman, Howard J BMC Bioinformatics Methodology Article BACKGROUND: Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior to similarity searching for example. Numerous computational methods exist for doing so, but most operate only on a single protein chain and many are limited to making a series of cuts to the sequence, while domains can and do span multiple chains. RESULTS: This study presents a novel clustering-based approach to domain identification, which works equally well on individual chains or entire complexes. The method is simple and fast, taking only a few milliseconds to run, and works by clustering either vectors representing secondary structure elements, or buried alpha-carbon positions, using average-linkage clustering. Each resulting cluster corresponds to a domain of the structure. The method is competitive with others, achieving 70% agreement with SCOP on a large non-redundant data set, and 80% on a set more heavily weighted in multi-domain proteins on which both SCOP and CATH agree. CONCLUSIONS: It is encouraging that a basic method such as this performs nearly as well or better than some far more complex approaches. This suggests that protein domains are indeed for the most part simply compact regions of structure with a higher density of buried contacts within themselves than between each other. By representing the structure as a set of points or vectors in space, it allows us to break free of any artificial limitations that other approaches may depend upon. BioMed Central 2012-11-01 /pmc/articles/PMC3534501/ /pubmed/23116496 http://dx.doi.org/10.1186/1471-2105-13-286 Text en Copyright ©2012 Feldman; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Feldman, Howard J
Identifying structural domains of proteins using clustering
title Identifying structural domains of proteins using clustering
title_full Identifying structural domains of proteins using clustering
title_fullStr Identifying structural domains of proteins using clustering
title_full_unstemmed Identifying structural domains of proteins using clustering
title_short Identifying structural domains of proteins using clustering
title_sort identifying structural domains of proteins using clustering
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534501/
https://www.ncbi.nlm.nih.gov/pubmed/23116496
http://dx.doi.org/10.1186/1471-2105-13-286
work_keys_str_mv AT feldmanhowardj identifyingstructuraldomainsofproteinsusingclustering