Cargando…

Clustering protein environments for function prediction: finding PROSITE motifs in 3D

BACKGROUND: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describin...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoon, Sungroh, Ebert, Jessica C, Chung, Eui-Young, De Micheli, Giovanni, Altman, Russ B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892080/
https://www.ncbi.nlm.nih.gov/pubmed/17570144
http://dx.doi.org/10.1186/1471-2105-8-S4-S10
_version_ 1782133822531305472
author Yoon, Sungroh
Ebert, Jessica C
Chung, Eui-Young
De Micheli, Giovanni
Altman, Russ B
author_facet Yoon, Sungroh
Ebert, Jessica C
Chung, Eui-Young
De Micheli, Giovanni
Altman, Russ B
author_sort Yoon, Sungroh
collection PubMed
description BACKGROUND: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. RESULTS: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. CONCLUSION: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.
format Text
id pubmed-1892080
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18920802007-06-15 Clustering protein environments for function prediction: finding PROSITE motifs in 3D Yoon, Sungroh Ebert, Jessica C Chung, Eui-Young De Micheli, Giovanni Altman, Russ B BMC Bioinformatics Proceedings BACKGROUND: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. RESULTS: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. CONCLUSION: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites. BioMed Central 2007-05-22 /pmc/articles/PMC1892080/ /pubmed/17570144 http://dx.doi.org/10.1186/1471-2105-8-S4-S10 Text en Copyright © 2007 Yoon et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Yoon, Sungroh
Ebert, Jessica C
Chung, Eui-Young
De Micheli, Giovanni
Altman, Russ B
Clustering protein environments for function prediction: finding PROSITE motifs in 3D
title Clustering protein environments for function prediction: finding PROSITE motifs in 3D
title_full Clustering protein environments for function prediction: finding PROSITE motifs in 3D
title_fullStr Clustering protein environments for function prediction: finding PROSITE motifs in 3D
title_full_unstemmed Clustering protein environments for function prediction: finding PROSITE motifs in 3D
title_short Clustering protein environments for function prediction: finding PROSITE motifs in 3D
title_sort clustering protein environments for function prediction: finding prosite motifs in 3d
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892080/
https://www.ncbi.nlm.nih.gov/pubmed/17570144
http://dx.doi.org/10.1186/1471-2105-8-S4-S10
work_keys_str_mv AT yoonsungroh clusteringproteinenvironmentsforfunctionpredictionfindingprositemotifsin3d
AT ebertjessicac clusteringproteinenvironmentsforfunctionpredictionfindingprositemotifsin3d
AT chungeuiyoung clusteringproteinenvironmentsforfunctionpredictionfindingprositemotifsin3d
AT demicheligiovanni clusteringproteinenvironmentsforfunctionpredictionfindingprositemotifsin3d
AT altmanrussb clusteringproteinenvironmentsforfunctionpredictionfindingprositemotifsin3d