Cargando…
Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443755/ https://www.ncbi.nlm.nih.gov/pubmed/34526629 http://dx.doi.org/10.1038/s41598-021-97965-2 |
_version_ | 1784568352880984064 |
---|---|
author | Giannakoulias, Sam Shringari, Sumant R. Ferrie, John J. Petersson, E. James |
author_facet | Giannakoulias, Sam Shringari, Sumant R. Ferrie, John J. Petersson, E. James |
author_sort | Giannakoulias, Sam |
collection | PubMed |
description | The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches. |
format | Online Article Text |
id | pubmed-8443755 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-84437552021-09-20 Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine Giannakoulias, Sam Shringari, Sumant R. Ferrie, John J. Petersson, E. James Sci Rep Article The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches. Nature Publishing Group UK 2021-09-15 /pmc/articles/PMC8443755/ /pubmed/34526629 http://dx.doi.org/10.1038/s41598-021-97965-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Giannakoulias, Sam Shringari, Sumant R. Ferrie, John J. Petersson, E. James Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
title | Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
title_full | Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
title_fullStr | Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
title_full_unstemmed | Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
title_short | Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
title_sort | biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443755/ https://www.ncbi.nlm.nih.gov/pubmed/34526629 http://dx.doi.org/10.1038/s41598-021-97965-2 |
work_keys_str_mv | AT giannakouliassam biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine AT shringarisumantr biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine AT ferriejohnj biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine AT peterssonejames biomolecularsimulationbasedmachinelearningmodelsaccuratelypredictsitesoftolerabilitytotheunnaturalaminoacidacridonylalanine |