Cargando…

Predicting cytotoxicity from heterogeneous data sources with Bayesian learning

BACKGROUND: We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times a...

Descripción completa

Detalles Bibliográficos
Autores principales: Langdon, Sarah R, Mulgrew, Joanna, Paolini, Gaia V, van Hoorn, Willem P
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3004804/
https://www.ncbi.nlm.nih.gov/pubmed/21143909
http://dx.doi.org/10.1186/1758-2946-2-11
_version_ 1782194027232231424
author Langdon, Sarah R
Mulgrew, Joanna
Paolini, Gaia V
van Hoorn, Willem P
author_facet Langdon, Sarah R
Mulgrew, Joanna
Paolini, Gaia V
van Hoorn, Willem P
author_sort Langdon, Sarah R
collection PubMed
description BACKGROUND: We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times and cell types) to derive a general cytotoxicity model. Our main aim was to derive a computational model based on this data that can highlight potentially cytotoxic series early in the drug discovery process. RESULTS: We developed Bayesian models for each assay using Scitegic FCFP_6 fingerprints together with the default physical property descriptors. Pairs of assays that are mutually predictive were identified by calculating the ROC score of the model derived from one predicting the experimental outcome of the other, and vice versa. The prediction pairs were visualised in a network where nodes are assays and edges are drawn for ROC scores >0.60 in both directions. We observed that, if assay pairs (A, B) and (B, C) were mutually predictive, this was often not the case for the pair (A, C). The results from 48 assays connected to each other were merged in one training set of 145590 compounds and a general cytotoxicity model was derived. The model has been cross-validated as well as being validated with a set of 89 FDA approved drug compounds. CONCLUSIONS: We have generated a predictive model for general cytotoxicity which could speed up the drug discovery process in multiple ways. Firstly, this analysis has shown that the outcomes of different assay formats can be mutually predictive, thus removing the need to submit a potentially toxic compound to multiple assays. Furthermore, this analysis enables selection of (a) the easiest-to-run assay as corporate standard, or (b) the most descriptive panel of assays by including assays whose outcomes are not mutually predictive. The model is no replacement for a cytotoxicity assay but opens the opportunity to be more selective about which compounds are to be submitted to it. On a more mundane level, having data from more than 80 assays in one dataset answers, for the first time, the question - "what are the known cytotoxic compounds from the Pfizer compound collection?" Finally, having a predictive cytotoxicity model will assist the design of new compounds with a desired cytotoxicity profile, since comparison of the model output with data from an in vitro safety/toxicology assay suggests one is predictive of the other.
format Text
id pubmed-3004804
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30048042010-12-21 Predicting cytotoxicity from heterogeneous data sources with Bayesian learning Langdon, Sarah R Mulgrew, Joanna Paolini, Gaia V van Hoorn, Willem P J Cheminform Research Article BACKGROUND: We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times and cell types) to derive a general cytotoxicity model. Our main aim was to derive a computational model based on this data that can highlight potentially cytotoxic series early in the drug discovery process. RESULTS: We developed Bayesian models for each assay using Scitegic FCFP_6 fingerprints together with the default physical property descriptors. Pairs of assays that are mutually predictive were identified by calculating the ROC score of the model derived from one predicting the experimental outcome of the other, and vice versa. The prediction pairs were visualised in a network where nodes are assays and edges are drawn for ROC scores >0.60 in both directions. We observed that, if assay pairs (A, B) and (B, C) were mutually predictive, this was often not the case for the pair (A, C). The results from 48 assays connected to each other were merged in one training set of 145590 compounds and a general cytotoxicity model was derived. The model has been cross-validated as well as being validated with a set of 89 FDA approved drug compounds. CONCLUSIONS: We have generated a predictive model for general cytotoxicity which could speed up the drug discovery process in multiple ways. Firstly, this analysis has shown that the outcomes of different assay formats can be mutually predictive, thus removing the need to submit a potentially toxic compound to multiple assays. Furthermore, this analysis enables selection of (a) the easiest-to-run assay as corporate standard, or (b) the most descriptive panel of assays by including assays whose outcomes are not mutually predictive. The model is no replacement for a cytotoxicity assay but opens the opportunity to be more selective about which compounds are to be submitted to it. On a more mundane level, having data from more than 80 assays in one dataset answers, for the first time, the question - "what are the known cytotoxic compounds from the Pfizer compound collection?" Finally, having a predictive cytotoxicity model will assist the design of new compounds with a desired cytotoxicity profile, since comparison of the model output with data from an in vitro safety/toxicology assay suggests one is predictive of the other. BioMed Central 2010-12-09 /pmc/articles/PMC3004804/ /pubmed/21143909 http://dx.doi.org/10.1186/1758-2946-2-11 Text en Copyright ©2010 Langdon et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Langdon, Sarah R
Mulgrew, Joanna
Paolini, Gaia V
van Hoorn, Willem P
Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
title Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
title_full Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
title_fullStr Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
title_full_unstemmed Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
title_short Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
title_sort predicting cytotoxicity from heterogeneous data sources with bayesian learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3004804/
https://www.ncbi.nlm.nih.gov/pubmed/21143909
http://dx.doi.org/10.1186/1758-2946-2-11
work_keys_str_mv AT langdonsarahr predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning
AT mulgrewjoanna predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning
AT paolinigaiav predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning
AT vanhoornwillemp predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning