Cargando…

Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification

Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Fontove, Fernando, Del Rio, Gabriel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516957/
https://www.ncbi.nlm.nih.gov/pubmed/33286246
http://dx.doi.org/10.3390/e22040472
_version_ 1783587117710966784
author Fontove, Fernando
Del Rio, Gabriel
author_facet Fontove, Fernando
Del Rio, Gabriel
author_sort Fontove, Fernando
collection PubMed
description Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for protein structure or function classification do not rely on the same protein representation. Here we provide a computationally efficient implementation for large datasets to calculate residue cluster classes (RCCs) from protein three-dimensional structures and show that such representations enable a random forest algorithm to effectively learn the structural and functional classifications of proteins, according to the CATH and Gene Ontology criteria, respectively. RCCs are derived from residue contact maps built from different distance criteria, and we show that 7 or 8 Å with or without amino acid side-chain atoms rendered the best classification models. The potential use of a unified representation of proteins is discussed and possible future areas for improvement and exploration are presented.
format Online
Article
Text
id pubmed-7516957
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75169572020-11-09 Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification Fontove, Fernando Del Rio, Gabriel Entropy (Basel) Article Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for protein structure or function classification do not rely on the same protein representation. Here we provide a computationally efficient implementation for large datasets to calculate residue cluster classes (RCCs) from protein three-dimensional structures and show that such representations enable a random forest algorithm to effectively learn the structural and functional classifications of proteins, according to the CATH and Gene Ontology criteria, respectively. RCCs are derived from residue contact maps built from different distance criteria, and we show that 7 or 8 Å with or without amino acid side-chain atoms rendered the best classification models. The potential use of a unified representation of proteins is discussed and possible future areas for improvement and exploration are presented. MDPI 2020-04-20 /pmc/articles/PMC7516957/ /pubmed/33286246 http://dx.doi.org/10.3390/e22040472 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fontove, Fernando
Del Rio, Gabriel
Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
title Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
title_full Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
title_fullStr Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
title_full_unstemmed Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
title_short Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification
title_sort residue cluster classes: a unified protein representation for efficient structural and functional classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516957/
https://www.ncbi.nlm.nih.gov/pubmed/33286246
http://dx.doi.org/10.3390/e22040472
work_keys_str_mv AT fontovefernando residueclusterclassesaunifiedproteinrepresentationforefficientstructuralandfunctionalclassification
AT delriogabriel residueclusterclassesaunifiedproteinrepresentationforefficientstructuralandfunctionalclassification