Cargando…

Screening membraneless organelle participants with machine-learning models that integrate multimodal features

Protein self-assembly is one of the formation mechanisms of biomolecular condensates. However, most phase-separating systems (PS) demand multiple partners in biological conditions. In this study, we divided PS proteins into two groups according to the mechanism by which they undergo PS: PS-Self prot...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zhaoming, Hou, Chao, Wang, Liang, Yu, Chunyu, Chen, Taoyu, Shen, Boyan, Hou, Yaoyao, Li, Pilong, Li, Tingting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214545/
https://www.ncbi.nlm.nih.gov/pubmed/35687670
http://dx.doi.org/10.1073/pnas.2115369119
_version_ 1784731042553266176
author Chen, Zhaoming
Hou, Chao
Wang, Liang
Yu, Chunyu
Chen, Taoyu
Shen, Boyan
Hou, Yaoyao
Li, Pilong
Li, Tingting
author_facet Chen, Zhaoming
Hou, Chao
Wang, Liang
Yu, Chunyu
Chen, Taoyu
Shen, Boyan
Hou, Yaoyao
Li, Pilong
Li, Tingting
author_sort Chen, Zhaoming
collection PubMed
description Protein self-assembly is one of the formation mechanisms of biomolecular condensates. However, most phase-separating systems (PS) demand multiple partners in biological conditions. In this study, we divided PS proteins into two groups according to the mechanism by which they undergo PS: PS-Self proteins can self-assemble spontaneously to form droplets, while PS-Part proteins interact with partners to undergo PS. Analysis of the amino acid composition revealed differences in the sequence pattern between the two protein groups. Existing PS predictors, when evaluated on two test protein sets, preferentially predicted self-assembling proteins. Thus, a comprehensive predictor is required. Herein, we propose that properties other than sequence composition can provide crucial information in screening PS proteins. By incorporating phosphorylation frequencies and immunofluorescence image-based droplet-forming propensity with other PS-related features, we built two independent machine-learning models to separately predict the two protein categories. Results of independent testing suggested the superiority of integrating multimodal features. We performed experimental verification on the top-scored proteins DHX9, K(i)-67, and NIFK. Their PS behavior in vitro revealed the effectiveness of our models in PS prediction. Further validation on the proteome of membraneless organelles confirmed the ability of our models to identify PS-Part proteins. We implemented a web server named PhaSePred (http://predict.phasep.pro/) that incorporates our two models together with representative PS predictors. PhaSePred displays proteome-level quantiles of different features, thus profiling PS propensity and providing crucial information for identification of candidate proteins.
format Online
Article
Text
id pubmed-9214545
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-92145452022-06-23 Screening membraneless organelle participants with machine-learning models that integrate multimodal features Chen, Zhaoming Hou, Chao Wang, Liang Yu, Chunyu Chen, Taoyu Shen, Boyan Hou, Yaoyao Li, Pilong Li, Tingting Proc Natl Acad Sci U S A Biological Sciences Protein self-assembly is one of the formation mechanisms of biomolecular condensates. However, most phase-separating systems (PS) demand multiple partners in biological conditions. In this study, we divided PS proteins into two groups according to the mechanism by which they undergo PS: PS-Self proteins can self-assemble spontaneously to form droplets, while PS-Part proteins interact with partners to undergo PS. Analysis of the amino acid composition revealed differences in the sequence pattern between the two protein groups. Existing PS predictors, when evaluated on two test protein sets, preferentially predicted self-assembling proteins. Thus, a comprehensive predictor is required. Herein, we propose that properties other than sequence composition can provide crucial information in screening PS proteins. By incorporating phosphorylation frequencies and immunofluorescence image-based droplet-forming propensity with other PS-related features, we built two independent machine-learning models to separately predict the two protein categories. Results of independent testing suggested the superiority of integrating multimodal features. We performed experimental verification on the top-scored proteins DHX9, K(i)-67, and NIFK. Their PS behavior in vitro revealed the effectiveness of our models in PS prediction. Further validation on the proteome of membraneless organelles confirmed the ability of our models to identify PS-Part proteins. We implemented a web server named PhaSePred (http://predict.phasep.pro/) that incorporates our two models together with representative PS predictors. PhaSePred displays proteome-level quantiles of different features, thus profiling PS propensity and providing crucial information for identification of candidate proteins. National Academy of Sciences 2022-06-10 2022-06-14 /pmc/articles/PMC9214545/ /pubmed/35687670 http://dx.doi.org/10.1073/pnas.2115369119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Chen, Zhaoming
Hou, Chao
Wang, Liang
Yu, Chunyu
Chen, Taoyu
Shen, Boyan
Hou, Yaoyao
Li, Pilong
Li, Tingting
Screening membraneless organelle participants with machine-learning models that integrate multimodal features
title Screening membraneless organelle participants with machine-learning models that integrate multimodal features
title_full Screening membraneless organelle participants with machine-learning models that integrate multimodal features
title_fullStr Screening membraneless organelle participants with machine-learning models that integrate multimodal features
title_full_unstemmed Screening membraneless organelle participants with machine-learning models that integrate multimodal features
title_short Screening membraneless organelle participants with machine-learning models that integrate multimodal features
title_sort screening membraneless organelle participants with machine-learning models that integrate multimodal features
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214545/
https://www.ncbi.nlm.nih.gov/pubmed/35687670
http://dx.doi.org/10.1073/pnas.2115369119
work_keys_str_mv AT chenzhaoming screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT houchao screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT wangliang screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT yuchunyu screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT chentaoyu screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT shenboyan screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT houyaoyao screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT lipilong screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures
AT litingting screeningmembranelessorganelleparticipantswithmachinelearningmodelsthatintegratemultimodalfeatures