Cargando…
Structure-aware protein self-supervised learning
MOTIVATION: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacl...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10139775/ https://www.ncbi.nlm.nih.gov/pubmed/37052532 http://dx.doi.org/10.1093/bioinformatics/btad189 |
_version_ | 1785033017480183808 |
---|---|
author | Chen, Can (Sam) Zhou, Jingbo Wang, Fan Liu, Xue Dou, Dejing |
author_facet | Chen, Can (Sam) Zhou, Jingbo Wang, Fan Liu, Xue Dou, Dejing |
author_sort | Chen, Can (Sam) |
collection | PubMed |
description | MOTIVATION: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacle to effective protein representation learning. However, existing protein representation learning is usually pretrained on protein sequences without considering the important protein structural information. RESULTS: In this work, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a graph neural network model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed graph neural network model via a novel pseudo bi-level optimization scheme. We conduct experiments on three downstream tasks: the binary classification into membrane/non-membrane proteins, the location classification into 10 cellular compartments, and the enzyme-catalyzed reaction classification into 384 EC numbers, and these experiments verify the effectiveness of our proposed method. AVAILABILITY AND IMPLEMENTATION: The Alphafold2 database is available in https://alphafold.ebi.ac.uk/. The PDB files are available in https://www.rcsb.org/. The downstream tasks are available in https://github.com/phermosilla/IEConv\_proteins/tree/master/Datasets. The code of the proposed method is available in https://github.com/GGchen1997/STEPS_Bioinformatics. |
format | Online Article Text |
id | pubmed-10139775 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101397752023-04-28 Structure-aware protein self-supervised learning Chen, Can (Sam) Zhou, Jingbo Wang, Fan Liu, Xue Dou, Dejing Bioinformatics Original Paper MOTIVATION: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacle to effective protein representation learning. However, existing protein representation learning is usually pretrained on protein sequences without considering the important protein structural information. RESULTS: In this work, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a graph neural network model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed graph neural network model via a novel pseudo bi-level optimization scheme. We conduct experiments on three downstream tasks: the binary classification into membrane/non-membrane proteins, the location classification into 10 cellular compartments, and the enzyme-catalyzed reaction classification into 384 EC numbers, and these experiments verify the effectiveness of our proposed method. AVAILABILITY AND IMPLEMENTATION: The Alphafold2 database is available in https://alphafold.ebi.ac.uk/. The PDB files are available in https://www.rcsb.org/. The downstream tasks are available in https://github.com/phermosilla/IEConv\_proteins/tree/master/Datasets. The code of the proposed method is available in https://github.com/GGchen1997/STEPS_Bioinformatics. Oxford University Press 2023-04-13 /pmc/articles/PMC10139775/ /pubmed/37052532 http://dx.doi.org/10.1093/bioinformatics/btad189 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Chen, Can (Sam) Zhou, Jingbo Wang, Fan Liu, Xue Dou, Dejing Structure-aware protein self-supervised learning |
title | Structure-aware protein self-supervised learning |
title_full | Structure-aware protein self-supervised learning |
title_fullStr | Structure-aware protein self-supervised learning |
title_full_unstemmed | Structure-aware protein self-supervised learning |
title_short | Structure-aware protein self-supervised learning |
title_sort | structure-aware protein self-supervised learning |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10139775/ https://www.ncbi.nlm.nih.gov/pubmed/37052532 http://dx.doi.org/10.1093/bioinformatics/btad189 |
work_keys_str_mv | AT chencansam structureawareproteinselfsupervisedlearning AT zhoujingbo structureawareproteinselfsupervisedlearning AT wangfan structureawareproteinselfsupervisedlearning AT liuxue structureawareproteinselfsupervisedlearning AT doudejing structureawareproteinselfsupervisedlearning |