Cargando…
Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches construct...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6704061/ https://www.ncbi.nlm.nih.gov/pubmed/31434908 http://dx.doi.org/10.1038/s41598-019-47536-3 |
_version_ | 1783445427616481280 |
---|---|
author | Ogura, Keiji Sato, Tomohiro Yuki, Hitomi Honma, Teruki |
author_facet | Ogura, Keiji Sato, Tomohiro Yuki, Hitomi Honma, Teruki |
author_sort | Ogura, Keiji |
collection | PubMed |
description | Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds. |
format | Online Article Text |
id | pubmed-6704061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-67040612019-08-23 Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II Ogura, Keiji Sato, Tomohiro Yuki, Hitomi Honma, Teruki Sci Rep Article Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds. Nature Publishing Group UK 2019-08-21 /pmc/articles/PMC6704061/ /pubmed/31434908 http://dx.doi.org/10.1038/s41598-019-47536-3 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Ogura, Keiji Sato, Tomohiro Yuki, Hitomi Honma, Teruki Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II |
title | Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II |
title_full | Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II |
title_fullStr | Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II |
title_full_unstemmed | Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II |
title_short | Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II |
title_sort | support vector machine model for herg inhibitory activities based on the integrated herg database using descriptor selection by nsga-ii |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6704061/ https://www.ncbi.nlm.nih.gov/pubmed/31434908 http://dx.doi.org/10.1038/s41598-019-47536-3 |
work_keys_str_mv | AT ogurakeiji supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii AT satotomohiro supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii AT yukihitomi supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii AT honmateruki supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii |