Cargando…

An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions

Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Hao, Vernon, Robert M., Forman-Kay, Julie D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405563/
https://www.ncbi.nlm.nih.gov/pubmed/36009025
http://dx.doi.org/10.3390/biom12081131
_version_ 1784773908748042240
author Cai, Hao
Vernon, Robert M.
Forman-Kay, Julie D.
author_facet Cai, Hao
Vernon, Robert M.
Forman-Kay, Julie D.
author_sort Cai, Hao
collection PubMed
description Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi–pi contacts, and kinked beta-structures to the score, with electrostatics, cation–pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence’s phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.
format Online
Article
Text
id pubmed-9405563
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94055632022-08-26 An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions Cai, Hao Vernon, Robert M. Forman-Kay, Julie D. Biomolecules Article Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi–pi contacts, and kinked beta-structures to the score, with electrostatics, cation–pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence’s phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates. MDPI 2022-08-17 /pmc/articles/PMC9405563/ /pubmed/36009025 http://dx.doi.org/10.3390/biom12081131 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cai, Hao
Vernon, Robert M.
Forman-Kay, Julie D.
An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_full An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_fullStr An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_full_unstemmed An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_short An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_sort interpretable machine-learning algorithm to predict disordered protein phase separation based on biophysical interactions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405563/
https://www.ncbi.nlm.nih.gov/pubmed/36009025
http://dx.doi.org/10.3390/biom12081131
work_keys_str_mv AT caihao aninterpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT vernonrobertm aninterpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT formankayjulied aninterpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT caihao interpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT vernonrobertm interpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT formankayjulied interpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions