Cargando…
Automatic classification of protein structures relying on similarities between alignments
BACKGROUND: Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow f...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534633/ https://www.ncbi.nlm.nih.gov/pubmed/22974051 http://dx.doi.org/10.1186/1471-2105-13-233 |
_version_ | 1782475371322540032 |
---|---|
author | Santini, Guillaume Soldano, Henry Pothier, Joël |
author_facet | Santini, Guillaume Soldano, Henry Pothier, Joël |
author_sort | Santini, Guillaume |
collection | PubMed |
description | BACKGROUND: Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. RESULTS: When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. CONCLUSIONS: We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. |
format | Online Article Text |
id | pubmed-3534633 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35346332013-01-03 Automatic classification of protein structures relying on similarities between alignments Santini, Guillaume Soldano, Henry Pothier, Joël BMC Bioinformatics Research Article BACKGROUND: Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. RESULTS: When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. CONCLUSIONS: We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. BioMed Central 2012-09-14 /pmc/articles/PMC3534633/ /pubmed/22974051 http://dx.doi.org/10.1186/1471-2105-13-233 Text en Copyright ©2012 Santini et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Santini, Guillaume Soldano, Henry Pothier, Joël Automatic classification of protein structures relying on similarities between alignments |
title | Automatic classification of protein structures relying on similarities between alignments |
title_full | Automatic classification of protein structures relying on similarities between alignments |
title_fullStr | Automatic classification of protein structures relying on similarities between alignments |
title_full_unstemmed | Automatic classification of protein structures relying on similarities between alignments |
title_short | Automatic classification of protein structures relying on similarities between alignments |
title_sort | automatic classification of protein structures relying on similarities between alignments |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534633/ https://www.ncbi.nlm.nih.gov/pubmed/22974051 http://dx.doi.org/10.1186/1471-2105-13-233 |
work_keys_str_mv | AT santiniguillaume automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments AT soldanohenry automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments AT pothierjoel automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments |