Cargando…

S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure

Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays a...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Duolin, Abbas, Usman L, Shao, Qing, Chen, Jin, Xu, Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441326/
https://www.ncbi.nlm.nih.gov/pubmed/37609352
http://dx.doi.org/10.1101/2023.08.06.552203
_version_ 1785093353609625600
author Wang, Duolin
Abbas, Usman L
Shao, Qing
Chen, Jin
Xu, Dong
author_facet Wang, Duolin
Abbas, Usman L
Shao, Qing
Chen, Jin
Xu, Dong
author_sort Wang, Duolin
collection PubMed
description Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays an important role in protein properties and functions. However, most PLMs are trained only on sequence data and lack protein 3D structure information. The lack of such crucial 3D structure information hampers the prediction capacity of PLMs in various applications, especially those heavily depending on the 3D structure. We utilize contrastive learning to develop a 3D structure-aware protein language model (S-PLM). The model encodes the sequence and 3D structure of proteins separately and deploys a multi-view contrastive loss function to enable the information exchange between the sequence and 3D structure embeddings. Our analysis shows that contrastive learning effectively incorporates 3D structure information into sequence-based embeddings. This implementation enhances the predictive performance of the sequence-based embedding in several downstream tasks.
format Online
Article
Text
id pubmed-10441326
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-104413262023-08-22 S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure Wang, Duolin Abbas, Usman L Shao, Qing Chen, Jin Xu, Dong bioRxiv Article Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays an important role in protein properties and functions. However, most PLMs are trained only on sequence data and lack protein 3D structure information. The lack of such crucial 3D structure information hampers the prediction capacity of PLMs in various applications, especially those heavily depending on the 3D structure. We utilize contrastive learning to develop a 3D structure-aware protein language model (S-PLM). The model encodes the sequence and 3D structure of proteins separately and deploys a multi-view contrastive loss function to enable the information exchange between the sequence and 3D structure embeddings. Our analysis shows that contrastive learning effectively incorporates 3D structure information into sequence-based embeddings. This implementation enhances the predictive performance of the sequence-based embedding in several downstream tasks. Cold Spring Harbor Laboratory 2023-08-07 /pmc/articles/PMC10441326/ /pubmed/37609352 http://dx.doi.org/10.1101/2023.08.06.552203 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Wang, Duolin
Abbas, Usman L
Shao, Qing
Chen, Jin
Xu, Dong
S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
title S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
title_full S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
title_fullStr S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
title_full_unstemmed S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
title_short S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
title_sort s-plm: structure-aware protein language model via contrastive learning between sequence and structure
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441326/
https://www.ncbi.nlm.nih.gov/pubmed/37609352
http://dx.doi.org/10.1101/2023.08.06.552203
work_keys_str_mv AT wangduolin splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT abbasusmanl splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT shaoqing splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT chenjin splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure
AT xudong splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure