Cargando…

Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach

[Image: see text] Explainable and interpretable unsupervised machine learning helps one to understand the underlying structure of data. We introduce an ensemble analysis of machine learning models to consolidate their interpretation. Its application shows that restricted Boltzmann machines compress...

Descripción completa

Detalles Bibliográficos
Autores principales: Braghetto, Anna, Orlandini, Enzo, Baiesi, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500975/
https://www.ncbi.nlm.nih.gov/pubmed/37552831
http://dx.doi.org/10.1021/acs.jctc.3c00383
_version_ 1785106030471938048
author Braghetto, Anna
Orlandini, Enzo
Baiesi, Marco
author_facet Braghetto, Anna
Orlandini, Enzo
Baiesi, Marco
author_sort Braghetto, Anna
collection PubMed
description [Image: see text] Explainable and interpretable unsupervised machine learning helps one to understand the underlying structure of data. We introduce an ensemble analysis of machine learning models to consolidate their interpretation. Its application shows that restricted Boltzmann machines compress consistently into a few bits the information stored in a sequence of five amino acids at the start or end of α-helices or β-sheets. The weights learned by the machines reveal unexpected properties of the amino acids and the secondary structure of proteins: (i) His and Thr have a negligible contribution to the amphiphilic pattern of α-helices; (ii) there is a class of α-helices particularly rich in Ala at their end; (iii) Pro occupies most often slots otherwise occupied by polar or charged amino acids, and its presence at the start of helices is relevant; (iv) Glu and especially Asp on one side and Val, Leu, Iso, and Phe on the other display the strongest tendency to mark amphiphilic patterns, i.e., extreme values of an effective hydrophobicity, though they are not the most powerful (non)hydrophobic amino acids.
format Online
Article
Text
id pubmed-10500975
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-105009752023-09-15 Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach Braghetto, Anna Orlandini, Enzo Baiesi, Marco J Chem Theory Comput [Image: see text] Explainable and interpretable unsupervised machine learning helps one to understand the underlying structure of data. We introduce an ensemble analysis of machine learning models to consolidate their interpretation. Its application shows that restricted Boltzmann machines compress consistently into a few bits the information stored in a sequence of five amino acids at the start or end of α-helices or β-sheets. The weights learned by the machines reveal unexpected properties of the amino acids and the secondary structure of proteins: (i) His and Thr have a negligible contribution to the amphiphilic pattern of α-helices; (ii) there is a class of α-helices particularly rich in Ala at their end; (iii) Pro occupies most often slots otherwise occupied by polar or charged amino acids, and its presence at the start of helices is relevant; (iv) Glu and especially Asp on one side and Val, Leu, Iso, and Phe on the other display the strongest tendency to mark amphiphilic patterns, i.e., extreme values of an effective hydrophobicity, though they are not the most powerful (non)hydrophobic amino acids. American Chemical Society 2023-08-08 /pmc/articles/PMC10500975/ /pubmed/37552831 http://dx.doi.org/10.1021/acs.jctc.3c00383 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Braghetto, Anna
Orlandini, Enzo
Baiesi, Marco
Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach
title Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach
title_full Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach
title_fullStr Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach
title_full_unstemmed Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach
title_short Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach
title_sort interpretable machine learning of amino acid patterns in proteins: a statistical ensemble approach
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500975/
https://www.ncbi.nlm.nih.gov/pubmed/37552831
http://dx.doi.org/10.1021/acs.jctc.3c00383
work_keys_str_mv AT braghettoanna interpretablemachinelearningofaminoacidpatternsinproteinsastatisticalensembleapproach
AT orlandinienzo interpretablemachinelearningofaminoacidpatternsinproteinsastatisticalensembleapproach
AT baiesimarco interpretablemachinelearningofaminoacidpatternsinproteinsastatisticalensembleapproach