Cargando…

Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity

Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Allen, Chad H. G., Mervin, Lewis H., Mahmoud, Samar Y., Bender, Andreas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544914/ https://www.ncbi.nlm.nih.gov/pubmed/31152262 http://dx.doi.org/10.1186/s13321-019-0356-5

_version_	1783423310623670272
author	Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas
author_facet	Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas
author_sort	Allen, Chad H. G.
collection	PubMed
description	Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen’s d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen’s d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80–0.92 and 0.70–0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80–0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0356-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6544914
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-65449142019-06-04 Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas J Cheminform Research Article Despite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.48). We then investigated to what degree toxic and non-toxic compounds could be separated in each of the spaces, to compare their potential contribution to further analyses. Using an LDA projection, we found the largest degree of separation using chemical descriptors (Cohen’s d of 1.95) and the lowest degree of separation between toxicity classes using qHTS descriptors (Cohen’s d of 0.67). To compare the predictivity of the feature spaces for the toxicity endpoint, we next trained Random Forest (RF) acute oral toxicity classifiers on either molecular, protein target and qHTS descriptors. RFs trained on molecular and protein target descriptors were most predictive, with ROC AUC values of 0.80–0.92 and 0.70–0.85, respectively, across three test sets. RFs trained on both chemical and protein target descriptors combined exhibited similar predictive performance to the single-domain models (ROC AUC of 0.80–0.91). Model interpretability was improved by the inclusion of protein target descriptors, which allow the identification of specific targets (e.g. Retinal dehydrogenase) with literature links to toxic modes of action (e.g. oxidative stress). The dataset compiled in this study has been made available for future application. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13321-019-0356-5) contains supplementary material, which is available to authorized users. Springer International Publishing 2019-05-31 /pmc/articles/PMC6544914/ /pubmed/31152262 http://dx.doi.org/10.1186/s13321-019-0356-5 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Allen, Chad H. G. Mervin, Lewis H. Mahmoud, Samar Y. Bender, Andreas Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
title	Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
title_full	Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
title_fullStr	Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
title_full_unstemmed	Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
title_short	Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity
title_sort	leveraging heterogeneous data from ghs toxicity annotations, molecular and protein target descriptors and tox21 assay readouts to predict and rationalise acute toxicity
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544914/ https://www.ncbi.nlm.nih.gov/pubmed/31152262 http://dx.doi.org/10.1186/s13321-019-0356-5
work_keys_str_mv	AT allenchadhg leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity AT mervinlewish leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity AT mahmoudsamary leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity AT benderandreas leveragingheterogeneousdatafromghstoxicityannotationsmolecularandproteintargetdescriptorsandtox21assayreadoutstopredictandrationaliseacutetoxicity

Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity

Ejemplares similares