Cargando…

Graph-BERT and language model-based framework for protein–protein interaction identification

Identification of protein–protein interactions (PPI) is among the critical problems in the domain of bioinformatics. Previous studies have utilized different AI-based models for PPI classification with advances in artificial intelligence (AI) techniques. The input to these models is the features ext...

Descripción completa

Detalles Bibliográficos
Autores principales: Jha, Kanchan, Karmakar, Sourav, Saha, Sriparna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079975/
https://www.ncbi.nlm.nih.gov/pubmed/37024543
http://dx.doi.org/10.1038/s41598-023-31612-w
_version_ 1785020822495166464
author Jha, Kanchan
Karmakar, Sourav
Saha, Sriparna
author_facet Jha, Kanchan
Karmakar, Sourav
Saha, Sriparna
author_sort Jha, Kanchan
collection PubMed
description Identification of protein–protein interactions (PPI) is among the critical problems in the domain of bioinformatics. Previous studies have utilized different AI-based models for PPI classification with advances in artificial intelligence (AI) techniques. The input to these models is the features extracted from different sources of protein information, mainly sequence-derived features. In this work, we present an AI-based PPI identification model utilizing a PPI network and protein sequences. The PPI network is represented as a graph where each node is a protein pair, and an edge is defined between two nodes if there exists a common protein between these nodes. Each node in a graph has a feature vector. In this work, we have used the language model to extract feature vectors directly from protein sequences. The feature vectors for protein in pairs are concatenated and used as a node feature vector of a PPI network graph. Finally, we have used the Graph-BERT model to encode the PPI network graph with sequence-based features and learn the hidden representation of the feature vector for each node. The next step involves feeding the learned representations of nodes to the fully connected layer, the output of which is fed into the softmax layer to classify the protein interactions. To assess the efficacy of the proposed PPI model, we have performed experiments on several PPI datasets. The experimental results demonstrate that the proposed approach surpasses the existing PPI works and designed baselines in classifying PPI.
format Online
Article
Text
id pubmed-10079975
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-100799752023-04-08 Graph-BERT and language model-based framework for protein–protein interaction identification Jha, Kanchan Karmakar, Sourav Saha, Sriparna Sci Rep Article Identification of protein–protein interactions (PPI) is among the critical problems in the domain of bioinformatics. Previous studies have utilized different AI-based models for PPI classification with advances in artificial intelligence (AI) techniques. The input to these models is the features extracted from different sources of protein information, mainly sequence-derived features. In this work, we present an AI-based PPI identification model utilizing a PPI network and protein sequences. The PPI network is represented as a graph where each node is a protein pair, and an edge is defined between two nodes if there exists a common protein between these nodes. Each node in a graph has a feature vector. In this work, we have used the language model to extract feature vectors directly from protein sequences. The feature vectors for protein in pairs are concatenated and used as a node feature vector of a PPI network graph. Finally, we have used the Graph-BERT model to encode the PPI network graph with sequence-based features and learn the hidden representation of the feature vector for each node. The next step involves feeding the learned representations of nodes to the fully connected layer, the output of which is fed into the softmax layer to classify the protein interactions. To assess the efficacy of the proposed PPI model, we have performed experiments on several PPI datasets. The experimental results demonstrate that the proposed approach surpasses the existing PPI works and designed baselines in classifying PPI. Nature Publishing Group UK 2023-04-06 /pmc/articles/PMC10079975/ /pubmed/37024543 http://dx.doi.org/10.1038/s41598-023-31612-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Jha, Kanchan
Karmakar, Sourav
Saha, Sriparna
Graph-BERT and language model-based framework for protein–protein interaction identification
title Graph-BERT and language model-based framework for protein–protein interaction identification
title_full Graph-BERT and language model-based framework for protein–protein interaction identification
title_fullStr Graph-BERT and language model-based framework for protein–protein interaction identification
title_full_unstemmed Graph-BERT and language model-based framework for protein–protein interaction identification
title_short Graph-BERT and language model-based framework for protein–protein interaction identification
title_sort graph-bert and language model-based framework for protein–protein interaction identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079975/
https://www.ncbi.nlm.nih.gov/pubmed/37024543
http://dx.doi.org/10.1038/s41598-023-31612-w
work_keys_str_mv AT jhakanchan graphbertandlanguagemodelbasedframeworkforproteinproteininteractionidentification
AT karmakarsourav graphbertandlanguagemodelbasedframeworkforproteinproteininteractionidentification
AT sahasriparna graphbertandlanguagemodelbasedframeworkforproteinproteininteractionidentification