Cargando…

A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction

Protein-protein interaction is one of the ways viruses interact with their hosts. Therefore, identifying protein interactions between viruses and hosts helps explain how virus proteins work, how they replicate, and how they cause disease. SARS-CoV-2 is a new type of virus that emerged from the coron...

Descripción completa

Detalles Bibliográficos
Autor principal: Ozger, Zeynep Banu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10162945/
https://www.ncbi.nlm.nih.gov/pubmed/37316102
http://dx.doi.org/10.1016/j.artmed.2023.102574
_version_ 1785037790821482496
author Ozger, Zeynep Banu
author_facet Ozger, Zeynep Banu
author_sort Ozger, Zeynep Banu
collection PubMed
description Protein-protein interaction is one of the ways viruses interact with their hosts. Therefore, identifying protein interactions between viruses and hosts helps explain how virus proteins work, how they replicate, and how they cause disease. SARS-CoV-2 is a new type of virus that emerged from the coronavirus family in 2019 and caused a worldwide pandemic. Detection of human proteins interacting with this novel virus strain plays an important role in monitoring the cellular process of virus-associated infection. Within the scope of the study, a natural language processing-based collective learning method is proposed for the prediction of potential SARS-CoV-2-human PPIs. Protein language models were obtained with the prediction-based word2Vec and doc2Vec embedding methods and the frequency-based tf-idf method. Known interactions were represented by proposed language models and traditional feature extraction methods (conjoint triad and repeat pattern), and their performances were compared. The interaction data were trained with support vector machine, artificial neural network (ANN), k-nearest neighbor (KNN), naive Bayes (NB), decision tree (DT), and ensemble algorithms. Experimental results show that protein language models are a promising protein representation method for protein-protein interaction prediction. The term frequency-inverse document frequency-based language model performed the SARS-CoV-2 protein-protein interaction estimation with an error of 1.4%. Additionally, the decisions of high-performing learning models for different feature extraction methods were combined with a collective voting approach to make new interaction predictions. For 10,000 human proteins, 285 new potential interactions were predicted, with models combining decisions.
format Online
Article
Text
id pubmed-10162945
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-101629452023-05-08 A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction Ozger, Zeynep Banu Artif Intell Med Research Paper Protein-protein interaction is one of the ways viruses interact with their hosts. Therefore, identifying protein interactions between viruses and hosts helps explain how virus proteins work, how they replicate, and how they cause disease. SARS-CoV-2 is a new type of virus that emerged from the coronavirus family in 2019 and caused a worldwide pandemic. Detection of human proteins interacting with this novel virus strain plays an important role in monitoring the cellular process of virus-associated infection. Within the scope of the study, a natural language processing-based collective learning method is proposed for the prediction of potential SARS-CoV-2-human PPIs. Protein language models were obtained with the prediction-based word2Vec and doc2Vec embedding methods and the frequency-based tf-idf method. Known interactions were represented by proposed language models and traditional feature extraction methods (conjoint triad and repeat pattern), and their performances were compared. The interaction data were trained with support vector machine, artificial neural network (ANN), k-nearest neighbor (KNN), naive Bayes (NB), decision tree (DT), and ensemble algorithms. Experimental results show that protein language models are a promising protein representation method for protein-protein interaction prediction. The term frequency-inverse document frequency-based language model performed the SARS-CoV-2 protein-protein interaction estimation with an error of 1.4%. Additionally, the decisions of high-performing learning models for different feature extraction methods were combined with a collective voting approach to make new interaction predictions. For 10,000 human proteins, 285 new potential interactions were predicted, with models combining decisions. Elsevier B.V. 2023-08 2023-05-06 /pmc/articles/PMC10162945/ /pubmed/37316102 http://dx.doi.org/10.1016/j.artmed.2023.102574 Text en © 2023 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Research Paper
Ozger, Zeynep Banu
A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction
title A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction
title_full A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction
title_fullStr A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction
title_full_unstemmed A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction
title_short A robust protein language model for SARS-CoV-2 protein–protein interaction network prediction
title_sort robust protein language model for sars-cov-2 protein–protein interaction network prediction
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10162945/
https://www.ncbi.nlm.nih.gov/pubmed/37316102
http://dx.doi.org/10.1016/j.artmed.2023.102574
work_keys_str_mv AT ozgerzeynepbanu arobustproteinlanguagemodelforsarscov2proteinproteininteractionnetworkprediction
AT ozgerzeynepbanu robustproteinlanguagemodelforsarscov2proteinproteininteractionnetworkprediction