Cargando…
Predicting cytotoxicity from heterogeneous data sources with Bayesian learning
BACKGROUND: We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times a...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3004804/ https://www.ncbi.nlm.nih.gov/pubmed/21143909 http://dx.doi.org/10.1186/1758-2946-2-11 |
_version_ | 1782194027232231424 |
---|---|
author | Langdon, Sarah R Mulgrew, Joanna Paolini, Gaia V van Hoorn, Willem P |
author_facet | Langdon, Sarah R Mulgrew, Joanna Paolini, Gaia V van Hoorn, Willem P |
author_sort | Langdon, Sarah R |
collection | PubMed |
description | BACKGROUND: We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times and cell types) to derive a general cytotoxicity model. Our main aim was to derive a computational model based on this data that can highlight potentially cytotoxic series early in the drug discovery process. RESULTS: We developed Bayesian models for each assay using Scitegic FCFP_6 fingerprints together with the default physical property descriptors. Pairs of assays that are mutually predictive were identified by calculating the ROC score of the model derived from one predicting the experimental outcome of the other, and vice versa. The prediction pairs were visualised in a network where nodes are assays and edges are drawn for ROC scores >0.60 in both directions. We observed that, if assay pairs (A, B) and (B, C) were mutually predictive, this was often not the case for the pair (A, C). The results from 48 assays connected to each other were merged in one training set of 145590 compounds and a general cytotoxicity model was derived. The model has been cross-validated as well as being validated with a set of 89 FDA approved drug compounds. CONCLUSIONS: We have generated a predictive model for general cytotoxicity which could speed up the drug discovery process in multiple ways. Firstly, this analysis has shown that the outcomes of different assay formats can be mutually predictive, thus removing the need to submit a potentially toxic compound to multiple assays. Furthermore, this analysis enables selection of (a) the easiest-to-run assay as corporate standard, or (b) the most descriptive panel of assays by including assays whose outcomes are not mutually predictive. The model is no replacement for a cytotoxicity assay but opens the opportunity to be more selective about which compounds are to be submitted to it. On a more mundane level, having data from more than 80 assays in one dataset answers, for the first time, the question - "what are the known cytotoxic compounds from the Pfizer compound collection?" Finally, having a predictive cytotoxicity model will assist the design of new compounds with a desired cytotoxicity profile, since comparison of the model output with data from an in vitro safety/toxicology assay suggests one is predictive of the other. |
format | Text |
id | pubmed-3004804 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30048042010-12-21 Predicting cytotoxicity from heterogeneous data sources with Bayesian learning Langdon, Sarah R Mulgrew, Joanna Paolini, Gaia V van Hoorn, Willem P J Cheminform Research Article BACKGROUND: We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times and cell types) to derive a general cytotoxicity model. Our main aim was to derive a computational model based on this data that can highlight potentially cytotoxic series early in the drug discovery process. RESULTS: We developed Bayesian models for each assay using Scitegic FCFP_6 fingerprints together with the default physical property descriptors. Pairs of assays that are mutually predictive were identified by calculating the ROC score of the model derived from one predicting the experimental outcome of the other, and vice versa. The prediction pairs were visualised in a network where nodes are assays and edges are drawn for ROC scores >0.60 in both directions. We observed that, if assay pairs (A, B) and (B, C) were mutually predictive, this was often not the case for the pair (A, C). The results from 48 assays connected to each other were merged in one training set of 145590 compounds and a general cytotoxicity model was derived. The model has been cross-validated as well as being validated with a set of 89 FDA approved drug compounds. CONCLUSIONS: We have generated a predictive model for general cytotoxicity which could speed up the drug discovery process in multiple ways. Firstly, this analysis has shown that the outcomes of different assay formats can be mutually predictive, thus removing the need to submit a potentially toxic compound to multiple assays. Furthermore, this analysis enables selection of (a) the easiest-to-run assay as corporate standard, or (b) the most descriptive panel of assays by including assays whose outcomes are not mutually predictive. The model is no replacement for a cytotoxicity assay but opens the opportunity to be more selective about which compounds are to be submitted to it. On a more mundane level, having data from more than 80 assays in one dataset answers, for the first time, the question - "what are the known cytotoxic compounds from the Pfizer compound collection?" Finally, having a predictive cytotoxicity model will assist the design of new compounds with a desired cytotoxicity profile, since comparison of the model output with data from an in vitro safety/toxicology assay suggests one is predictive of the other. BioMed Central 2010-12-09 /pmc/articles/PMC3004804/ /pubmed/21143909 http://dx.doi.org/10.1186/1758-2946-2-11 Text en Copyright ©2010 Langdon et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Langdon, Sarah R Mulgrew, Joanna Paolini, Gaia V van Hoorn, Willem P Predicting cytotoxicity from heterogeneous data sources with Bayesian learning |
title | Predicting cytotoxicity from heterogeneous data sources with Bayesian learning |
title_full | Predicting cytotoxicity from heterogeneous data sources with Bayesian learning |
title_fullStr | Predicting cytotoxicity from heterogeneous data sources with Bayesian learning |
title_full_unstemmed | Predicting cytotoxicity from heterogeneous data sources with Bayesian learning |
title_short | Predicting cytotoxicity from heterogeneous data sources with Bayesian learning |
title_sort | predicting cytotoxicity from heterogeneous data sources with bayesian learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3004804/ https://www.ncbi.nlm.nih.gov/pubmed/21143909 http://dx.doi.org/10.1186/1758-2946-2-11 |
work_keys_str_mv | AT langdonsarahr predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning AT mulgrewjoanna predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning AT paolinigaiav predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning AT vanhoornwillemp predictingcytotoxicityfromheterogeneousdatasourceswithbayesianlearning |