Cargando…

Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers

BACKGROUND: The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode...

Descripción completa

Detalles Bibliográficos
Autores principales: Tyzack, Jonathan D, Mussa, Hamse Y, Williamson, Mark J, Kirchmair, Johannes, Glen, Robert C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4047555/
https://www.ncbi.nlm.nih.gov/pubmed/24959208
http://dx.doi.org/10.1186/1758-2946-6-29
_version_ 1782480413676011520
author Tyzack, Jonathan D
Mussa, Hamse Y
Williamson, Mark J
Kirchmair, Johannes
Glen, Robert C
author_facet Tyzack, Jonathan D
Mussa, Hamse Y
Williamson, Mark J
Kirchmair, Johannes
Glen, Robert C
author_sort Tyzack, Jonathan D
collection PubMed
description BACKGROUND: The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Window (PRW), Naive Bayesian (NB) and a novel approach called RASCAL (Random Attribute Subsampling Classification ALgorithm). These are implemented by randomly subsampling descriptor space to alleviate the problem often suffered by data mining methods of having to exactly match fingerprints, and in the case of PRW by measuring a distance between feature vectors rather than exact matching. The classifiers have been implemented in CUDA/C++ to exploit the parallel architecture of graphical processing units (GPUs) and is freely available in a public repository. RESULTS: It is shown that for PRW a SoM (Site of Metabolism) is identified in the top two predictions for 85%, 91% and 88% of the CYP 3A4, 2D6 and 2C9 data sets respectively, with RASCAL giving similar performance of 83%, 91% and 88%, respectively. These results put PRW and RASCAL performance ahead of NB which gave a much lower classification performance of 51%, 73% and 74%, respectively. CONCLUSIONS: 2D topological fingerprints calculated to a bond depth of 4-6 contain sufficient information to allow the identification of SoMs using classifiers based on relatively small data sets. Thus, the machine learning methods outlined in this paper are conceptually simpler and more efficient than other methods tested and the use of simple topological descriptors derived from 2D structure give results competitive with other approaches using more expensive quantum chemical descriptors. The descriptor space subsampling approach and ensemble methodology allow the methods to be applied to molecules more distant from the training data where data mining would be more likely to fail due to the lack of common fingerprints. The RASCAL algorithm is shown to give equivalent classification performance to PRW but at lower computational expense allowing it to be applied more efficiently in the ensemble scheme.
format Online
Article
Text
id pubmed-4047555
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40475552014-06-23 Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers Tyzack, Jonathan D Mussa, Hamse Y Williamson, Mark J Kirchmair, Johannes Glen, Robert C J Cheminform Research Article BACKGROUND: The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Window (PRW), Naive Bayesian (NB) and a novel approach called RASCAL (Random Attribute Subsampling Classification ALgorithm). These are implemented by randomly subsampling descriptor space to alleviate the problem often suffered by data mining methods of having to exactly match fingerprints, and in the case of PRW by measuring a distance between feature vectors rather than exact matching. The classifiers have been implemented in CUDA/C++ to exploit the parallel architecture of graphical processing units (GPUs) and is freely available in a public repository. RESULTS: It is shown that for PRW a SoM (Site of Metabolism) is identified in the top two predictions for 85%, 91% and 88% of the CYP 3A4, 2D6 and 2C9 data sets respectively, with RASCAL giving similar performance of 83%, 91% and 88%, respectively. These results put PRW and RASCAL performance ahead of NB which gave a much lower classification performance of 51%, 73% and 74%, respectively. CONCLUSIONS: 2D topological fingerprints calculated to a bond depth of 4-6 contain sufficient information to allow the identification of SoMs using classifiers based on relatively small data sets. Thus, the machine learning methods outlined in this paper are conceptually simpler and more efficient than other methods tested and the use of simple topological descriptors derived from 2D structure give results competitive with other approaches using more expensive quantum chemical descriptors. The descriptor space subsampling approach and ensemble methodology allow the methods to be applied to molecules more distant from the training data where data mining would be more likely to fail due to the lack of common fingerprints. The RASCAL algorithm is shown to give equivalent classification performance to PRW but at lower computational expense allowing it to be applied more efficiently in the ensemble scheme. BioMed Central 2014-05-27 /pmc/articles/PMC4047555/ /pubmed/24959208 http://dx.doi.org/10.1186/1758-2946-6-29 Text en Copyright © 2014 Tyzack et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedicationwaiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwisestated.
spellingShingle Research Article
Tyzack, Jonathan D
Mussa, Hamse Y
Williamson, Mark J
Kirchmair, Johannes
Glen, Robert C
Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
title Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
title_full Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
title_fullStr Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
title_full_unstemmed Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
title_short Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
title_sort cytochrome p450 site of metabolism prediction from 2d topological fingerprints using gpu accelerated probabilistic classifiers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4047555/
https://www.ncbi.nlm.nih.gov/pubmed/24959208
http://dx.doi.org/10.1186/1758-2946-6-29
work_keys_str_mv AT tyzackjonathand cytochromep450siteofmetabolismpredictionfrom2dtopologicalfingerprintsusinggpuacceleratedprobabilisticclassifiers
AT mussahamsey cytochromep450siteofmetabolismpredictionfrom2dtopologicalfingerprintsusinggpuacceleratedprobabilisticclassifiers
AT williamsonmarkj cytochromep450siteofmetabolismpredictionfrom2dtopologicalfingerprintsusinggpuacceleratedprobabilisticclassifiers
AT kirchmairjohannes cytochromep450siteofmetabolismpredictionfrom2dtopologicalfingerprintsusinggpuacceleratedprobabilisticclassifiers
AT glenrobertc cytochromep450siteofmetabolismpredictionfrom2dtopologicalfingerprintsusinggpuacceleratedprobabilisticclassifiers