Cargando…

Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network

[Image: see text] Thirty-eight percent of protein structures in the Protein Data Bank contain at least one metal ion. However, not all these metal sites are biologically relevant. Cations present as impurities during sample preparation or in the crystallization buffer can cause the formation of prot...

Descripción completa

Detalles Bibliográficos
Autores principales: Laveglia, Vincenzo, Giachetti, Andrea, Sala, Davide, Andreini, Claudia, Rosato, Antonio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9241070/
https://www.ncbi.nlm.nih.gov/pubmed/35679182
http://dx.doi.org/10.1021/acs.jcim.2c00522
_version_ 1784737713903108096
author Laveglia, Vincenzo
Giachetti, Andrea
Sala, Davide
Andreini, Claudia
Rosato, Antonio
author_facet Laveglia, Vincenzo
Giachetti, Andrea
Sala, Davide
Andreini, Claudia
Rosato, Antonio
author_sort Laveglia, Vincenzo
collection PubMed
description [Image: see text] Thirty-eight percent of protein structures in the Protein Data Bank contain at least one metal ion. However, not all these metal sites are biologically relevant. Cations present as impurities during sample preparation or in the crystallization buffer can cause the formation of protein–metal complexes that do not exist in vivo. We implemented a deep learning approach to build a classifier able to distinguish between physiological and adventitious zinc-binding sites in the 3D structures of metalloproteins. We trained the classifier using manually annotated sites extracted from the MetalPDB database. Using a 10-fold cross validation procedure, the classifier achieved an accuracy of about 90%. The same neural classifier could predict the physiological relevance of non-heme mononuclear iron sites with an accuracy of nearly 80%, suggesting that the rules learned on zinc sites have general relevance. By quantifying the relative importance of the features describing the input zinc sites from the network perspective and by analyzing the characteristics of the MetalPDB datasets, we inferred some common principles. Physiological sites present a low solvent accessibility of the aminoacids forming coordination bonds with the metal ion (the metal ligands), a relatively large number of residues in the metal environment (≥20), and a distinct pattern of conservation of Cys and His residues in the site. Adventitious sites, on the other hand, tend to have a low number of donor atoms from the polypeptide chain (often one or two). These observations support the evaluation of the physiological relevance of novel metal-binding sites in protein structures.
format Online
Article
Text
id pubmed-9241070
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-92410702022-06-30 Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network Laveglia, Vincenzo Giachetti, Andrea Sala, Davide Andreini, Claudia Rosato, Antonio J Chem Inf Model [Image: see text] Thirty-eight percent of protein structures in the Protein Data Bank contain at least one metal ion. However, not all these metal sites are biologically relevant. Cations present as impurities during sample preparation or in the crystallization buffer can cause the formation of protein–metal complexes that do not exist in vivo. We implemented a deep learning approach to build a classifier able to distinguish between physiological and adventitious zinc-binding sites in the 3D structures of metalloproteins. We trained the classifier using manually annotated sites extracted from the MetalPDB database. Using a 10-fold cross validation procedure, the classifier achieved an accuracy of about 90%. The same neural classifier could predict the physiological relevance of non-heme mononuclear iron sites with an accuracy of nearly 80%, suggesting that the rules learned on zinc sites have general relevance. By quantifying the relative importance of the features describing the input zinc sites from the network perspective and by analyzing the characteristics of the MetalPDB datasets, we inferred some common principles. Physiological sites present a low solvent accessibility of the aminoacids forming coordination bonds with the metal ion (the metal ligands), a relatively large number of residues in the metal environment (≥20), and a distinct pattern of conservation of Cys and His residues in the site. Adventitious sites, on the other hand, tend to have a low number of donor atoms from the polypeptide chain (often one or two). These observations support the evaluation of the physiological relevance of novel metal-binding sites in protein structures. American Chemical Society 2022-06-09 2022-06-27 /pmc/articles/PMC9241070/ /pubmed/35679182 http://dx.doi.org/10.1021/acs.jcim.2c00522 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Laveglia, Vincenzo
Giachetti, Andrea
Sala, Davide
Andreini, Claudia
Rosato, Antonio
Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network
title Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network
title_full Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network
title_fullStr Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network
title_full_unstemmed Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network
title_short Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network
title_sort learning to identify physiological and adventitious metal-binding sites in the three-dimensional structures of proteins by following the hints of a deep neural network
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9241070/
https://www.ncbi.nlm.nih.gov/pubmed/35679182
http://dx.doi.org/10.1021/acs.jcim.2c00522
work_keys_str_mv AT lavegliavincenzo learningtoidentifyphysiologicalandadventitiousmetalbindingsitesinthethreedimensionalstructuresofproteinsbyfollowingthehintsofadeepneuralnetwork
AT giachettiandrea learningtoidentifyphysiologicalandadventitiousmetalbindingsitesinthethreedimensionalstructuresofproteinsbyfollowingthehintsofadeepneuralnetwork
AT saladavide learningtoidentifyphysiologicalandadventitiousmetalbindingsitesinthethreedimensionalstructuresofproteinsbyfollowingthehintsofadeepneuralnetwork
AT andreiniclaudia learningtoidentifyphysiologicalandadventitiousmetalbindingsitesinthethreedimensionalstructuresofproteinsbyfollowingthehintsofadeepneuralnetwork
AT rosatoantonio learningtoidentifyphysiologicalandadventitiousmetalbindingsitesinthethreedimensionalstructuresofproteinsbyfollowingthehintsofadeepneuralnetwork