Cargando…

Structure-aware protein self-supervised learning

MOTIVATION: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacl...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Can (Sam), Zhou, Jingbo, Wang, Fan, Liu, Xue, Dou, Dejing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10139775/
https://www.ncbi.nlm.nih.gov/pubmed/37052532
http://dx.doi.org/10.1093/bioinformatics/btad189
_version_ 1785033017480183808
author Chen, Can (Sam)
Zhou, Jingbo
Wang, Fan
Liu, Xue
Dou, Dejing
author_facet Chen, Can (Sam)
Zhou, Jingbo
Wang, Fan
Liu, Xue
Dou, Dejing
author_sort Chen, Can (Sam)
collection PubMed
description MOTIVATION: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacle to effective protein representation learning. However, existing protein representation learning is usually pretrained on protein sequences without considering the important protein structural information. RESULTS: In this work, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a graph neural network model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed graph neural network model via a novel pseudo bi-level optimization scheme. We conduct experiments on three downstream tasks: the binary classification into membrane/non-membrane proteins, the location classification into 10 cellular compartments, and the enzyme-catalyzed reaction classification into 384 EC numbers, and these experiments verify the effectiveness of our proposed method. AVAILABILITY AND IMPLEMENTATION: The Alphafold2 database is available in https://alphafold.ebi.ac.uk/. The PDB files are available in https://www.rcsb.org/. The downstream tasks are available in https://github.com/phermosilla/IEConv\_proteins/tree/master/Datasets. The code of the proposed method is available in https://github.com/GGchen1997/STEPS_Bioinformatics.
format Online
Article
Text
id pubmed-10139775
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101397752023-04-28 Structure-aware protein self-supervised learning Chen, Can (Sam) Zhou, Jingbo Wang, Fan Liu, Xue Dou, Dejing Bioinformatics Original Paper MOTIVATION: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacle to effective protein representation learning. However, existing protein representation learning is usually pretrained on protein sequences without considering the important protein structural information. RESULTS: In this work, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a graph neural network model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed graph neural network model via a novel pseudo bi-level optimization scheme. We conduct experiments on three downstream tasks: the binary classification into membrane/non-membrane proteins, the location classification into 10 cellular compartments, and the enzyme-catalyzed reaction classification into 384 EC numbers, and these experiments verify the effectiveness of our proposed method. AVAILABILITY AND IMPLEMENTATION: The Alphafold2 database is available in https://alphafold.ebi.ac.uk/. The PDB files are available in https://www.rcsb.org/. The downstream tasks are available in https://github.com/phermosilla/IEConv\_proteins/tree/master/Datasets. The code of the proposed method is available in https://github.com/GGchen1997/STEPS_Bioinformatics. Oxford University Press 2023-04-13 /pmc/articles/PMC10139775/ /pubmed/37052532 http://dx.doi.org/10.1093/bioinformatics/btad189 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Chen, Can (Sam)
Zhou, Jingbo
Wang, Fan
Liu, Xue
Dou, Dejing
Structure-aware protein self-supervised learning
title Structure-aware protein self-supervised learning
title_full Structure-aware protein self-supervised learning
title_fullStr Structure-aware protein self-supervised learning
title_full_unstemmed Structure-aware protein self-supervised learning
title_short Structure-aware protein self-supervised learning
title_sort structure-aware protein self-supervised learning
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10139775/
https://www.ncbi.nlm.nih.gov/pubmed/37052532
http://dx.doi.org/10.1093/bioinformatics/btad189
work_keys_str_mv AT chencansam structureawareproteinselfsupervisedlearning
AT zhoujingbo structureawareproteinselfsupervisedlearning
AT wangfan structureawareproteinselfsupervisedlearning
AT liuxue structureawareproteinselfsupervisedlearning
AT doudejing structureawareproteinselfsupervisedlearning