Cargando…

Fast protein structure comparison through effective representation learning with contrastive graph neural networks

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we p...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Chunqiu, Feng, Shi-Hao, Xia, Ying, Pan, Xiaoyong, Shen, Hong-Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982879/
https://www.ncbi.nlm.nih.gov/pubmed/35324898
http://dx.doi.org/10.1371/journal.pcbi.1009986
Descripción
Sumario:Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.