Cargando…

THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model

MOTIVATION: Quantitative determination of protein thermodynamic stability is a critical step in protein and drug design. Reliable prediction of protein stability changes caused by point variations contributes to developing-related fields. Over the past decades, dozens of structure-based and sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Gong, Jianting, Jiang, Lili, Chen, Yongbing, Zhang, Yixiang, Li, Xue, Ma, Zhiqiang, Fu, Zhiguo, He, Fei, Sun, Pingping, Ren, Zilin, Tian, Mingyao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627365/
https://www.ncbi.nlm.nih.gov/pubmed/37874953
http://dx.doi.org/10.1093/bioinformatics/btad646
Descripción
Sumario:MOTIVATION: Quantitative determination of protein thermodynamic stability is a critical step in protein and drug design. Reliable prediction of protein stability changes caused by point variations contributes to developing-related fields. Over the past decades, dozens of structure-based and sequence-based methods have been proposed, showing good prediction performance. Despite the impressive progress, it is necessary to explore wild-type and variant protein representations to address the problem of how to represent the protein stability change in view of global sequence. With the development of structure prediction using learning-based methods, protein language models (PLMs) have shown accurate and high-quality predictions of protein structure. Because PLM captures the atomic-level structural information, it can help to understand how single-point variations cause functional changes. RESULTS: Here, we proposed THPLM, a sequence-based deep learning model for stability change prediction using Meta’s ESM-2. With ESM-2 and a simple convolutional neural network, THPLM achieved comparable or even better performance than most methods, including sequence-based and structure-based methods. Furthermore, the experimental results indicate that the PLM’s ability to generate representations of sequence can effectively improve the ability of protein function prediction. AVAILABILITY AND IMPLEMENTATION: The source code of THPLM and the testing data can be accessible through the following links: https://github.com/FPPGroup/THPLM.