Cargando…

Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network

Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Youzhi, Yao, Sijie, Chen, Peng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506709/ https://www.ncbi.nlm.nih.gov/pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899

_version_	1785107161093767168
author	Zhang, Youzhi Yao, Sijie Chen, Peng
author_facet	Zhang, Youzhi Yao, Sijie Chen, Peng
author_sort	Zhang, Youzhi
collection	PubMed
description	Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance.
format	Online Article Text
id	pubmed-10506709
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-105067092023-09-19 Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network Zhang, Youzhi Yao, Sijie Chen, Peng PLoS One Research Article Protein hotspot residues are key sites that mediate protein-protein interactions. Accurate identification of these residues is essential for understanding the mechanism from protein to function and for designing drug targets. Current research has mostly focused on using machine learning methods to predict hot spots from known interface residues, which artificially extract the corresponding features of amino acid residues from sequence, structure, evolution, energy, and other information to train and test machine learning models. The process is cumbersome, time-consuming and laborious to some extent. This paper proposes a novel idea that develops a pre-trained protein sequence embedding model combined with a one-dimensional convolutional neural network, called Embed-1dCNN, to predict protein hotspot residues. In order to obtain large data samples, this work integrates and extracts data from the datasets of ASEdb, BID, SKEMPI and dbMPIKT to generate a new dataset, and adopts the SMOTE algorithm to expand positive samples to form the training set. The experimental results show that the method achieves an F1 score of 0.82 on the test set. Compared with other hot spot prediction methods, our model achieved better prediction performance. Public Library of Science 2023-09-18 /pmc/articles/PMC10506709/ /pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899 Text en © 2023 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zhang, Youzhi Yao, Sijie Chen, Peng Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title	Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_full	Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_fullStr	Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_full_unstemmed	Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_short	Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network
title_sort	prediction of hot spots towards drug discovery by protein sequence embedding with 1d convolutional neural network
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10506709/ https://www.ncbi.nlm.nih.gov/pubmed/37721924 http://dx.doi.org/10.1371/journal.pone.0290899
work_keys_str_mv	AT zhangyouzhi predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork AT yaosijie predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork AT chenpeng predictionofhotspotstowardsdrugdiscoverybyproteinsequenceembeddingwith1dconvolutionalneuralnetwork

Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network

Ejemplares similares