Cargando…

Classification of HIV-1 Protease Inhibitors by Machine Learning Methods

[Image: see text] HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yang, Tian, Yujia, Qin, Zijian, Yan, Aixia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2018
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288788/
https://www.ncbi.nlm.nih.gov/pubmed/30556015
http://dx.doi.org/10.1021/acsomega.8b01843
_version_ 1783379859686293504
author Li, Yang
Tian, Yujia
Qin, Zijian
Yan, Aixia
author_facet Li, Yang
Tian, Yujia
Qin, Zijian
Yan, Aixia
author_sort Li, Yang
collection PubMed
description [Image: see text] HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including k-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). The molecular structures were characterized by (1) fingerprint descriptors including MACCS fingerprints and PubChem fingerprints and (2) physicochemical descriptors calculated by CORINA Symphony. The prediction accuracies of all of the models are more than 70% on the test set; the best accuracy of 83.07% was obtained by model 4A, which was built by the SVM method based on MACCS fingerprint descriptors. Nine consensus models were built with three kinds of different descriptors, which combined all of the machine learning methods using the “consensus prediction”. Model C3(a) developed with MACCS fingerprint descriptors showed the highest accuracy on both training set (91.96%) and test set (83.15%). An external validation set including 35 989 compounds from DUD database and 239 active inhibitors from the recent literature was used to verify the performance of our model. The best prediction accuracy of 98.37% was obtained by model 3C, which was built by RF based on CORINA Symphony descriptors. In addition, from the analysis of molecular descriptors, it shows that the aromatic system and atoms related to hydrogen bonding provide important contributions to the bioactivity of PIs.
format Online
Article
Text
id pubmed-6288788
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-62887882018-12-12 Classification of HIV-1 Protease Inhibitors by Machine Learning Methods Li, Yang Tian, Yujia Qin, Zijian Yan, Aixia ACS Omega [Image: see text] HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including k-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). The molecular structures were characterized by (1) fingerprint descriptors including MACCS fingerprints and PubChem fingerprints and (2) physicochemical descriptors calculated by CORINA Symphony. The prediction accuracies of all of the models are more than 70% on the test set; the best accuracy of 83.07% was obtained by model 4A, which was built by the SVM method based on MACCS fingerprint descriptors. Nine consensus models were built with three kinds of different descriptors, which combined all of the machine learning methods using the “consensus prediction”. Model C3(a) developed with MACCS fingerprint descriptors showed the highest accuracy on both training set (91.96%) and test set (83.15%). An external validation set including 35 989 compounds from DUD database and 239 active inhibitors from the recent literature was used to verify the performance of our model. The best prediction accuracy of 98.37% was obtained by model 3C, which was built by RF based on CORINA Symphony descriptors. In addition, from the analysis of molecular descriptors, it shows that the aromatic system and atoms related to hydrogen bonding provide important contributions to the bioactivity of PIs. American Chemical Society 2018-11-21 /pmc/articles/PMC6288788/ /pubmed/30556015 http://dx.doi.org/10.1021/acsomega.8b01843 Text en Copyright © 2018 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Li, Yang
Tian, Yujia
Qin, Zijian
Yan, Aixia
Classification of HIV-1 Protease Inhibitors by Machine Learning Methods
title Classification of HIV-1 Protease Inhibitors by Machine Learning Methods
title_full Classification of HIV-1 Protease Inhibitors by Machine Learning Methods
title_fullStr Classification of HIV-1 Protease Inhibitors by Machine Learning Methods
title_full_unstemmed Classification of HIV-1 Protease Inhibitors by Machine Learning Methods
title_short Classification of HIV-1 Protease Inhibitors by Machine Learning Methods
title_sort classification of hiv-1 protease inhibitors by machine learning methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288788/
https://www.ncbi.nlm.nih.gov/pubmed/30556015
http://dx.doi.org/10.1021/acsomega.8b01843
work_keys_str_mv AT liyang classificationofhiv1proteaseinhibitorsbymachinelearningmethods
AT tianyujia classificationofhiv1proteaseinhibitorsbymachinelearningmethods
AT qinzijian classificationofhiv1proteaseinhibitorsbymachinelearningmethods
AT yanaixia classificationofhiv1proteaseinhibitorsbymachinelearningmethods