Cargando…

Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network

Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to p...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Youzhi, Yao, Sijie, Chen, Peng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506709/
https://www.ncbi.nlm.nih.gov/pubmed/37721924
http://dx.doi.org/10.1371/journal.pone.0290899
_version_ 1785107161093767168
author Zhang, Youzhi
Yao, Sijie
Chen, Peng
author_facet Zhang, Youzhi
Yao, Sijie
Chen, Peng
author_sort Zhang, Youzhi
collection PubMed
description Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.
format Online
Article
Text
id pubmed-10506709
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-105067092023-09-19 Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network Zhang, Youzhi Yao, Sijie Chen, Peng PLoS One Research Article Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance. Public Library of Science 2023-09-18 /pmc/articles/PMC10506709/ /pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899 Text en © 2023 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Youzhi
Yao, Sijie
Chen, Peng
Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_full Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_fullStr Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_full_unstemmed Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_short Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_sort prediction of hot spots towards drug discovery by protein sequence embedding with 1d convolutional neural network
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506709/
https://www.ncbi.nlm.nih.gov/pubmed/37721924
http://dx.doi.org/10.1371/journal.pone.0290899
work_keys_str_mv AT zhangyouzhi predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork
AT yaosijie predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork
AT chenpeng predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork