Cargando…
Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544914/ https://www.ncbi.nlm.nih.gov/pubmed/31152262 http://dx.doi.org/10.1186/s13321-019-0356-5 |
_version_ | 1783423310623670272 |
---|---|
author | Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas |
author_facet | Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas |
author_sort | Allen, Chad H. G. |
collection | PubMed |
description | Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen’s d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen’s d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80–0.92 and 0.70–0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80–0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0356-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6544914 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-65449142019-06-04 Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas J Cheminform Research Article Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen’s d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen’s d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80–0.92 and 0.70–0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80–0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0356-5) contains supplementary material, which is available to authorized users. Springer International Publishing 2019-05-31 /pmc/articles/PMC6544914/ /pubmed/31152262 http://dx.doi.org/10.1186/s13321-019-0356-5 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity |
title | Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity |
title_full | Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity |
title_fullStr | Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity |
title_full_unstemmed | Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity |
title_short | Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity |
title_sort | leveraging heterogeneous data from ghs toxicity annotations, molecular and protein target descriptors and tox21 assay readouts to predict and rationalise acute toxicity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544914/ https://www.ncbi.nlm.nih.gov/pubmed/31152262 http://dx.doi.org/10.1186/s13321-019-0356-5 |
work_keys_str_mv | AT allenchadhg leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity AT mervinlewish leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity AT mahmoudsamary leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity AT benderandreas leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity |