Cargando…

Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power

The trade-off between a machine learning (ML) and deep learning (DL) model’s predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure–activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Tzu-Hui, Su, Bo-Han, Battalora, Leo Chander, Liu, Sin, Tseng, Yufeng Jane
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Problem Solving Protocol
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769704/ https://www.ncbi.nlm.nih.gov/pubmed/34530437 http://dx.doi.org/10.1093/bib/bbab377

_version_	1784635208553725952
author	Yu, Tzu-Hui Su, Bo-Han Battalora, Leo Chander Liu, Sin Tseng, Yufeng Jane
author_facet	Yu, Tzu-Hui Su, Bo-Han Battalora, Leo Chander Liu, Sin Tseng, Yufeng Jane
author_sort	Yu, Tzu-Hui
collection	PubMed
description	The trade-off between a machine learning (ML) and deep learning (DL) model’s predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure–activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood–brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.
format	Online Article Text
id	pubmed-8769704
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-87697042022-01-20 Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power Yu, Tzu-Hui Su, Bo-Han Battalora, Leo Chander Liu, Sin Tseng, Yufeng Jane Brief Bioinform Problem Solving Protocol The trade-off between a machine learning (ML) and deep learning (DL) model’s predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure–activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood–brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions. Oxford University Press 2021-09-17 /pmc/articles/PMC8769704/ /pubmed/34530437 http://dx.doi.org/10.1093/bib/bbab377 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Problem Solving Protocol Yu, Tzu-Hui Su, Bo-Han Battalora, Leo Chander Liu, Sin Tseng, Yufeng Jane Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
title	Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
title_full	Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
title_fullStr	Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
title_full_unstemmed	Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
title_short	Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power
title_sort	ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying cns drugs with high prediction power
topic	Problem Solving Protocol
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769704/ https://www.ncbi.nlm.nih.gov/pubmed/34530437 http://dx.doi.org/10.1093/bib/bbab377
work_keys_str_mv	AT yutzuhui ensemblemodelingwithmachinelearninganddeeplearningtoprovideinterpretablegeneralizedrulesforclassifyingcnsdrugswithhighpredictionpower AT subohan ensemblemodelingwithmachinelearninganddeeplearningtoprovideinterpretablegeneralizedrulesforclassifyingcnsdrugswithhighpredictionpower AT battaloraleochander ensemblemodelingwithmachinelearninganddeeplearningtoprovideinterpretablegeneralizedrulesforclassifyingcnsdrugswithhighpredictionpower AT liusin ensemblemodelingwithmachinelearninganddeeplearningtoprovideinterpretablegeneralizedrulesforclassifyingcnsdrugswithhighpredictionpower AT tsengyufengjane ensemblemodelingwithmachinelearninganddeeplearningtoprovideinterpretablegeneralizedrulesforclassifyingcnsdrugswithhighpredictionpower

Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power

Ejemplares similares