Cargando…

Towards an automatic classification of protein structural domains based on structural similarity

BACKGROUND: Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSS...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sam, Vichetra, Tai, Chin-Hsien, Garnier, Jean, Gibrat, Jean-Francois, Lee, Byungkook, Munson, Peter J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267780/ https://www.ncbi.nlm.nih.gov/pubmed/18237410 http://dx.doi.org/10.1186/1471-2105-9-74

_version_	1782151656494858240
author	Sam, Vichetra Tai, Chin-Hsien Garnier, Jean Gibrat, Jean-Francois Lee, Byungkook Munson, Peter J
author_facet	Sam, Vichetra Tai, Chin-Hsien Garnier, Jean Gibrat, Jean-Francois Lee, Byungkook Munson, Peter J
author_sort	Sam, Vichetra
collection	PubMed
description	BACKGROUND: Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification. RESULTS: We use DALI, SHEBA and VAST pairwise scores on the SCOP C class domains, to investigate a variety of hierarchical clustering procedures. The constructed dendrogram is cut in a variety of ways to produce a partition, which is compared to the SCOP fold classification. Ward's method dendrograms led to partitions closest to the SCOP fold classification. Dendrogram- or tree-cutting strategies fell into four categories according to the similarity of resulting partitions to the SCOP fold partition. Two strategies which optimize similarity to SCOP, gave an average of 72% true positives rate (TPR), at a 1% false positive rate. Cutting the largest size cluster at each step gave an average of 61% TPR which was one of the best strategies not making use of prior knowledge of SCOP. Cutting the longest branch at each step produced one of the worst strategies. We also developed a method to detect irreducible differences between the best possible automatic partitions and SCOP, regardless of the cutting strategy. These differences are substantial. Visual examination of hard-to-classify proteins confirms our previous finding, that global structural similarity of domains is not the only criterion used in the SCOP classification. CONCLUSION: Different clustering procedures give rise to different levels of agreement between automatic and manual protein classifications. None of the tested procedures completely eliminates the divergence between automatic and manual protein classifications. Achieving full agreement between these two approaches would apparently require additional information.
format	Text
id	pubmed-2267780
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22677802008-03-18 Towards an automatic classification of protein structural domains based on structural similarity Sam, Vichetra Tai, Chin-Hsien Garnier, Jean Gibrat, Jean-Francois Lee, Byungkook Munson, Peter J BMC Bioinformatics Research Article BACKGROUND: Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification. RESULTS: We use DALI, SHEBA and VAST pairwise scores on the SCOP C class domains, to investigate a variety of hierarchical clustering procedures. The constructed dendrogram is cut in a variety of ways to produce a partition, which is compared to the SCOP fold classification. Ward's method dendrograms led to partitions closest to the SCOP fold classification. Dendrogram- or tree-cutting strategies fell into four categories according to the similarity of resulting partitions to the SCOP fold partition. Two strategies which optimize similarity to SCOP, gave an average of 72% true positives rate (TPR), at a 1% false positive rate. Cutting the largest size cluster at each step gave an average of 61% TPR which was one of the best strategies not making use of prior knowledge of SCOP. Cutting the longest branch at each step produced one of the worst strategies. We also developed a method to detect irreducible differences between the best possible automatic partitions and SCOP, regardless of the cutting strategy. These differences are substantial. Visual examination of hard-to-classify proteins confirms our previous finding, that global structural similarity of domains is not the only criterion used in the SCOP classification. CONCLUSION: Different clustering procedures give rise to different levels of agreement between automatic and manual protein classifications. None of the tested procedures completely eliminates the divergence between automatic and manual protein classifications. Achieving full agreement between these two approaches would apparently require additional information. BioMed Central 2008-01-31 /pmc/articles/PMC2267780/ /pubmed/18237410 http://dx.doi.org/10.1186/1471-2105-9-74 Text en Copyright © 2008 Sam et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Sam, Vichetra Tai, Chin-Hsien Garnier, Jean Gibrat, Jean-Francois Lee, Byungkook Munson, Peter J Towards an automatic classification of protein structural domains based on structural similarity
title	Towards an automatic classification of protein structural domains based on structural similarity
title_full	Towards an automatic classification of protein structural domains based on structural similarity
title_fullStr	Towards an automatic classification of protein structural domains based on structural similarity
title_full_unstemmed	Towards an automatic classification of protein structural domains based on structural similarity
title_short	Towards an automatic classification of protein structural domains based on structural similarity
title_sort	towards an automatic classification of protein structural domains based on structural similarity
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267780/ https://www.ncbi.nlm.nih.gov/pubmed/18237410 http://dx.doi.org/10.1186/1471-2105-9-74
work_keys_str_mv	AT samvichetra towardsanautomaticclassificationofproteinstructuraldomainsbasedonstructuralsimilarity AT taichinhsien towardsanautomaticclassificationofproteinstructuraldomainsbasedonstructuralsimilarity AT garnierjean towardsanautomaticclassificationofproteinstructuraldomainsbasedonstructuralsimilarity AT gibratjeanfrancois towardsanautomaticclassificationofproteinstructuraldomainsbasedonstructuralsimilarity AT leebyungkook towardsanautomaticclassificationofproteinstructuraldomainsbasedonstructuralsimilarity AT munsonpeterj towardsanautomaticclassificationofproteinstructuraldomainsbasedonstructuralsimilarity

Towards an automatic classification of protein structural domains based on structural similarity

Ejemplares similares