Cargando…

Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine

The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features...

Descripción completa

Detalles Bibliográficos
Autores principales: Giannakoulias, Sam, Shringari, Sumant R., Ferrie, John J., Petersson, E. James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443755/
https://www.ncbi.nlm.nih.gov/pubmed/34526629
http://dx.doi.org/10.1038/s41598-021-97965-2
_version_ 1784568352880984064
author Giannakoulias, Sam
Shringari, Sumant R.
Ferrie, John J.
Petersson, E. James
author_facet Giannakoulias, Sam
Shringari, Sumant R.
Ferrie, John J.
Petersson, E. James
author_sort Giannakoulias, Sam
collection PubMed
description The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches.
format Online
Article
Text
id pubmed-8443755
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-84437552021-09-20 Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine Giannakoulias, Sam Shringari, Sumant R. Ferrie, John J. Petersson, E. James Sci Rep Article The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches. Nature Publishing Group UK 2021-09-15 /pmc/articles/PMC8443755/ /pubmed/34526629 http://dx.doi.org/10.1038/s41598-021-97965-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Giannakoulias, Sam
Shringari, Sumant R.
Ferrie, John J.
Petersson, E. James
Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_full Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_fullStr Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_full_unstemmed Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_short Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
title_sort biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443755/
https://www.ncbi.nlm.nih.gov/pubmed/34526629
http://dx.doi.org/10.1038/s41598-021-97965-2
work_keys_str_mv AT giannakouliassam biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
AT shringarisumantr biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
AT ferriejohnj biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine
AT peterssonejames biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine