Cargando…
S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure
Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441326/ https://www.ncbi.nlm.nih.gov/pubmed/37609352 http://dx.doi.org/10.1101/2023.08.06.552203 |
_version_ | 1785093353609625600 |
---|---|
author | Wang, Duolin Abbas, Usman L Shao, Qing Chen, Jin Xu, Dong |
author_facet | Wang, Duolin Abbas, Usman L Shao, Qing Chen, Jin Xu, Dong |
author_sort | Wang, Duolin |
collection | PubMed |
description | Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays an important role in protein properties and functions. However, most PLMs are trained only on sequence data and lack protein 3D structure information. The lack of such crucial 3D structure information hampers the prediction capacity of PLMs in various applications, especially those heavily depending on the 3D structure. We utilize contrastive learning to develop a 3D structure-aware protein language model (S-PLM). The model encodes the sequence and 3D structure of proteins separately and deploys a multi-view contrastive loss function to enable the information exchange between the sequence and 3D structure embeddings. Our analysis shows that contrastive learning effectively incorporates 3D structure information into sequence-based embeddings. This implementation enhances the predictive performance of the sequence-based embedding in several downstream tasks. |
format | Online Article Text |
id | pubmed-10441326 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-104413262023-08-22 S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure Wang, Duolin Abbas, Usman L Shao, Qing Chen, Jin Xu, Dong bioRxiv Article Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays an important role in protein properties and functions. However, most PLMs are trained only on sequence data and lack protein 3D structure information. The lack of such crucial 3D structure information hampers the prediction capacity of PLMs in various applications, especially those heavily depending on the 3D structure. We utilize contrastive learning to develop a 3D structure-aware protein language model (S-PLM). The model encodes the sequence and 3D structure of proteins separately and deploys a multi-view contrastive loss function to enable the information exchange between the sequence and 3D structure embeddings. Our analysis shows that contrastive learning effectively incorporates 3D structure information into sequence-based embeddings. This implementation enhances the predictive performance of the sequence-based embedding in several downstream tasks. Cold Spring Harbor Laboratory 2023-08-07 /pmc/articles/PMC10441326/ /pubmed/37609352 http://dx.doi.org/10.1101/2023.08.06.552203 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Wang, Duolin Abbas, Usman L Shao, Qing Chen, Jin Xu, Dong S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure |
title | S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure |
title_full | S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure |
title_fullStr | S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure |
title_full_unstemmed | S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure |
title_short | S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure |
title_sort | s-plm: structure-aware protein language model via contrastive learning between sequence and structure |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441326/ https://www.ncbi.nlm.nih.gov/pubmed/37609352 http://dx.doi.org/10.1101/2023.08.06.552203 |
work_keys_str_mv | AT wangduolin splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT abbasusmanl splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT shaoqing splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT chenjin splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure AT xudong splmstructureawareproteinlanguagemodelviacontrastivelearningbetweensequenceandstructure |