Cargando…

Automatic classification of protein structures using low-dimensional structure space mappings

BACKGROUND: Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Str...

Descripción completa

Detalles Bibliográficos
Autores principales: Asarnow, Daniel, Singh, Rahul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016610/
https://www.ncbi.nlm.nih.gov/pubmed/24564500
http://dx.doi.org/10.1186/1471-2105-15-S2-S1
_version_ 1782315532094013440
author Asarnow, Daniel
Singh, Rahul
author_facet Asarnow, Daniel
Singh, Rahul
author_sort Asarnow, Daniel
collection PubMed
description BACKGROUND: Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classification of Proteins) database. The SCOP database and its counterparts such as CATH provide a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and are widely employed in structural and computational biology. Since manual classification is both subjective and highly laborious, automated classification of novel structures is increasingly an active area of research. The design of methods for automated structure classification has been rendered even more important since the recent past, due to the explosion in number of solved structures arising out of various structural biology initiatives. In this paper we propose an approach to the problem of structure classification based on creating and tessellating low dimensional maps of the protein structure space (MPSS). Given a set of protein structures, an MPSS is a low dimensional embedding of structural similarity-based distances between the molecules. In an MPSS, a group of proteins (such as all the proteins in the PDB or sub-samplings thereof) under consideration are represented as point clouds and structural relatedness maps to spatial adjacency of the points. In this paper we present methods and results that show that MPSS can be used to create tessellations of the protein space comparable to the clade systems within SCOP. Though we have used SCOP as the gold standard, the proposed approach is equally applicable for other structural classifications. METHODS: In the proposed approach, we first construct MPSS using pairwise alignment distances obtained from four established structure alignment algorithms (CE, Dali, FATCAT and MATT). The low dimensional embeddings are next computed using an embedding technique called multidimensional scaling (MDS). Next, by using the remotely homologous Superfamily and Fold levels of the hierarchical SCOP database, a distance threshold is determined to relate adjacency in the low dimensional map to functional relationships. In our approach, the optimal threshold is determined as the value that maximizes the total true classification rate vis-a-vis the SCOP classification. We also show that determining such a threshold is often straightforward, once the structural relationships are represented using MPSS. RESULTS AND CONCLUSION: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships. The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP. The significance of our results are underlined by the fact that most automated methods developed thus far have only managed to match the closest-homology Family level of the SCOP hierarchy and tend to differ considerably at the Superfamily and Fold levels. Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noise-reducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms.
format Online
Article
Text
id pubmed-4016610
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40166102014-05-23 Automatic classification of protein structures using low-dimensional structure space mappings Asarnow, Daniel Singh, Rahul BMC Bioinformatics Proceedings BACKGROUND: Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classification of Proteins) database. The SCOP database and its counterparts such as CATH provide a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and are widely employed in structural and computational biology. Since manual classification is both subjective and highly laborious, automated classification of novel structures is increasingly an active area of research. The design of methods for automated structure classification has been rendered even more important since the recent past, due to the explosion in number of solved structures arising out of various structural biology initiatives. In this paper we propose an approach to the problem of structure classification based on creating and tessellating low dimensional maps of the protein structure space (MPSS). Given a set of protein structures, an MPSS is a low dimensional embedding of structural similarity-based distances between the molecules. In an MPSS, a group of proteins (such as all the proteins in the PDB or sub-samplings thereof) under consideration are represented as point clouds and structural relatedness maps to spatial adjacency of the points. In this paper we present methods and results that show that MPSS can be used to create tessellations of the protein space comparable to the clade systems within SCOP. Though we have used SCOP as the gold standard, the proposed approach is equally applicable for other structural classifications. METHODS: In the proposed approach, we first construct MPSS using pairwise alignment distances obtained from four established structure alignment algorithms (CE, Dali, FATCAT and MATT). The low dimensional embeddings are next computed using an embedding technique called multidimensional scaling (MDS). Next, by using the remotely homologous Superfamily and Fold levels of the hierarchical SCOP database, a distance threshold is determined to relate adjacency in the low dimensional map to functional relationships. In our approach, the optimal threshold is determined as the value that maximizes the total true classification rate vis-a-vis the SCOP classification. We also show that determining such a threshold is often straightforward, once the structural relationships are represented using MPSS. RESULTS AND CONCLUSION: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships. The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP. The significance of our results are underlined by the fact that most automated methods developed thus far have only managed to match the closest-homology Family level of the SCOP hierarchy and tend to differ considerably at the Superfamily and Fold levels. Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noise-reducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms. BioMed Central 2014-01-24 /pmc/articles/PMC4016610/ /pubmed/24564500 http://dx.doi.org/10.1186/1471-2105-15-S2-S1 Text en Copyright © 2014 Asarnow and Singh; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Asarnow, Daniel
Singh, Rahul
Automatic classification of protein structures using low-dimensional structure space mappings
title Automatic classification of protein structures using low-dimensional structure space mappings
title_full Automatic classification of protein structures using low-dimensional structure space mappings
title_fullStr Automatic classification of protein structures using low-dimensional structure space mappings
title_full_unstemmed Automatic classification of protein structures using low-dimensional structure space mappings
title_short Automatic classification of protein structures using low-dimensional structure space mappings
title_sort automatic classification of protein structures using low-dimensional structure space mappings
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016610/
https://www.ncbi.nlm.nih.gov/pubmed/24564500
http://dx.doi.org/10.1186/1471-2105-15-S2-S1
work_keys_str_mv AT asarnowdaniel automaticclassificationofproteinstructuresusinglowdimensionalstructurespacemappings
AT singhrahul automaticclassificationofproteinstructuresusinglowdimensionalstructurespacemappings