Cargando…

Improve hot region prediction by analyzing different machine learning algorithms

BACKGROUND: In the process of designing drugs and proteins, it is crucial to recognize hot regions in protein–protein interactions. Each hot region of protein–protein interaction is composed of at least three hot spots, which play an important role in binding. However, it takes time and labor force...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Jing, Zhou, Longwei, Li, Bo, Zhang, Xiaolong, Chen, Nansheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8543831/
https://www.ncbi.nlm.nih.gov/pubmed/34696728
http://dx.doi.org/10.1186/s12859-021-04420-0
Descripción
Sumario:BACKGROUND: In the process of designing drugs and proteins, it is crucial to recognize hot regions in protein–protein interactions. Each hot region of protein–protein interaction is composed of at least three hot spots, which play an important role in binding. However, it takes time and labor force to identify hot spots through biological experiments. If predictive models based on machine learning methods can be trained, the drug design process can be effectively accelerated. RESULTS: The results show that different machine learning algorithms perform similarly, as evaluating using the F-measure. The main differences between these methods are recall and precision. Since the key attribute of hot regions is that they are packed tightly, we used the cluster algorithm to predict hot regions. By combining Gaussian Naïve Bayes and DBSCAN, the F-measure of hot region prediction can reach 0.809. CONCLUSIONS: In this paper, different machine learning models such as Gaussian Naïve Bayes, SVM, Xgboost, Random Forest, and Artificial Neural Network are used to predict hot spots. The experiment results show that the combination of hot spot classification algorithm with higher recall rate and clustering algorithm with higher precision can effectively improve the accuracy of hot region prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04420-0.