Cargando…

DMFpred: Predicting protein disorder molecular functions based on protein cubic language model

Intrinsically disordered proteins and regions (IDP/IDRs) are widespread in living organisms and perform various essential molecular functions. These functions are summarized as six general categories, including entropic chain, assembler, scavenger, effector, display site, and chaperone. The alterati...

Descripción completa

Detalles Bibliográficos
Autores principales: Pang, Yihe, Liu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674156/
https://www.ncbi.nlm.nih.gov/pubmed/36315580
http://dx.doi.org/10.1371/journal.pcbi.1010668
_version_ 1784833093565153280
author Pang, Yihe
Liu, Bin
author_facet Pang, Yihe
Liu, Bin
author_sort Pang, Yihe
collection PubMed
description Intrinsically disordered proteins and regions (IDP/IDRs) are widespread in living organisms and perform various essential molecular functions. These functions are summarized as six general categories, including entropic chain, assembler, scavenger, effector, display site, and chaperone. The alteration of IDP functions is responsible for many human diseases. Therefore, identifying the function of disordered proteins is helpful for the studies of drug target discovery and rational drug design. Experimental identification of the molecular functions of IDP in the wet lab is an expensive and laborious procedure that is not applicable on a large scale. Some computational methods have been proposed and mainly focus on predicting the entropic chain function of IDRs, while the computational predictive methods for the remaining five important categories of disordered molecular functions are desired. Motivated by the growing numbers of experimental annotated functional sequences and the need to expand the coverage of disordered protein function predictors, we proposed DMFpred for disordered molecular functions prediction, covering disordered assembler, scavenger, effector, display site and chaperone. DMFpred employs the Protein Cubic Language Model (PCLM), which incorporates three protein language models for characterizing sequences, structural and functional features of proteins, and attention-based alignment for understanding the relationship among three captured features and generating a joint representation of proteins. The PCLM was pre-trained with large-scaled IDR sequences and fine-tuned with functional annotation sequences for molecular function prediction. The predictive performance evaluation on five categories of functional and multi-functional residues suggested that DMFpred provides high-quality predictions. The web-server of DMFpred can be freely accessed from http://bliulab.net/DMFpred/.
format Online
Article
Text
id pubmed-9674156
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-96741562022-11-19 DMFpred: Predicting protein disorder molecular functions based on protein cubic language model Pang, Yihe Liu, Bin PLoS Comput Biol Research Article Intrinsically disordered proteins and regions (IDP/IDRs) are widespread in living organisms and perform various essential molecular functions. These functions are summarized as six general categories, including entropic chain, assembler, scavenger, effector, display site, and chaperone. The alteration of IDP functions is responsible for many human diseases. Therefore, identifying the function of disordered proteins is helpful for the studies of drug target discovery and rational drug design. Experimental identification of the molecular functions of IDP in the wet lab is an expensive and laborious procedure that is not applicable on a large scale. Some computational methods have been proposed and mainly focus on predicting the entropic chain function of IDRs, while the computational predictive methods for the remaining five important categories of disordered molecular functions are desired. Motivated by the growing numbers of experimental annotated functional sequences and the need to expand the coverage of disordered protein function predictors, we proposed DMFpred for disordered molecular functions prediction, covering disordered assembler, scavenger, effector, display site and chaperone. DMFpred employs the Protein Cubic Language Model (PCLM), which incorporates three protein language models for characterizing sequences, structural and functional features of proteins, and attention-based alignment for understanding the relationship among three captured features and generating a joint representation of proteins. The PCLM was pre-trained with large-scaled IDR sequences and fine-tuned with functional annotation sequences for molecular function prediction. The predictive performance evaluation on five categories of functional and multi-functional residues suggested that DMFpred provides high-quality predictions. The web-server of DMFpred can be freely accessed from http://bliulab.net/DMFpred/. Public Library of Science 2022-10-31 /pmc/articles/PMC9674156/ /pubmed/36315580 http://dx.doi.org/10.1371/journal.pcbi.1010668 Text en © 2022 Pang, Liu https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pang, Yihe
Liu, Bin
DMFpred: Predicting protein disorder molecular functions based on protein cubic language model
title DMFpred: Predicting protein disorder molecular functions based on protein cubic language model
title_full DMFpred: Predicting protein disorder molecular functions based on protein cubic language model
title_fullStr DMFpred: Predicting protein disorder molecular functions based on protein cubic language model
title_full_unstemmed DMFpred: Predicting protein disorder molecular functions based on protein cubic language model
title_short DMFpred: Predicting protein disorder molecular functions based on protein cubic language model
title_sort dmfpred: predicting protein disorder molecular functions based on protein cubic language model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674156/
https://www.ncbi.nlm.nih.gov/pubmed/36315580
http://dx.doi.org/10.1371/journal.pcbi.1010668
work_keys_str_mv AT pangyihe dmfpredpredictingproteindisordermolecularfunctionsbasedonproteincubiclanguagemodel
AT liubin dmfpredpredictingproteindisordermolecularfunctionsbasedonproteincubiclanguagemodel