Cargando…
Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to p...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506709/ https://www.ncbi.nlm.nih.gov/pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899 |
_version_ | 1785107161093767168 |
---|---|
author | Zhang, Youzhi Yao, Sijie Chen, Peng |
author_facet | Zhang, Youzhi Yao, Sijie Chen, Peng |
author_sort | Zhang, Youzhi |
collection | PubMed |
description | Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance. |
format | Online Article Text |
id | pubmed-10506709 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-105067092023-09-19 Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network Zhang, Youzhi Yao, Sijie Chen, Peng PLoS One Research Article Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance. Public Library of Science 2023-09-18 /pmc/articles/PMC10506709/ /pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899 Text en © 2023 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zhang, Youzhi Yao, Sijie Chen, Peng Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network |
title | Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network |
title_full | Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network |
title_fullStr | Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network |
title_full_unstemmed | Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network |
title_short | Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network |
title_sort | prediction of hot spots towards drug discovery by protein sequence embedding with 1d convolutional neural network |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506709/ https://www.ncbi.nlm.nih.gov/pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899 |
work_keys_str_mv | AT zhangyouzhi predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork AT yaosijie predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork AT chenpeng predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork |