Cargando…

Predicting the protein targets for athletic performance-enhancing substances

BACKGROUND: The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more pe...

Descripción completa

Detalles Bibliográficos
Autores principales: Mavridis, Lazaros, Mitchell, John BO
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3701582/
https://www.ncbi.nlm.nih.gov/pubmed/23800040
http://dx.doi.org/10.1186/1758-2946-5-31
_version_ 1782275670295969792
author Mavridis, Lazaros
Mitchell, John BO
author_facet Mavridis, Lazaros
Mitchell, John BO
author_sort Mavridis, Lazaros
collection PubMed
description BACKGROUND: The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport. RESULTS: The ChEMBL database was screened and eight well populated categories of activities (K(i), K(d), EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels “active” or “inactive”. The “active” compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL. CONCLUSIONS: We have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets associated with the WADA prohibited classes. For compounds where we do not have experimental data, we use their computed patterns of interaction with protein targets to make predictions of bioactivity. We hope that other groups will test these predictions experimentally in the future.
format Online
Article
Text
id pubmed-3701582
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37015822013-07-10 Predicting the protein targets for athletic performance-enhancing substances Mavridis, Lazaros Mitchell, John BO J Cheminform Research Article BACKGROUND: The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport. RESULTS: The ChEMBL database was screened and eight well populated categories of activities (K(i), K(d), EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels “active” or “inactive”. The “active” compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL. CONCLUSIONS: We have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets associated with the WADA prohibited classes. For compounds where we do not have experimental data, we use their computed patterns of interaction with protein targets to make predictions of bioactivity. We hope that other groups will test these predictions experimentally in the future. BioMed Central 2013-06-25 /pmc/articles/PMC3701582/ /pubmed/23800040 http://dx.doi.org/10.1186/1758-2946-5-31 Text en Copyright © 2013 Mavridis and Mitchell; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mavridis, Lazaros
Mitchell, John BO
Predicting the protein targets for athletic performance-enhancing substances
title Predicting the protein targets for athletic performance-enhancing substances
title_full Predicting the protein targets for athletic performance-enhancing substances
title_fullStr Predicting the protein targets for athletic performance-enhancing substances
title_full_unstemmed Predicting the protein targets for athletic performance-enhancing substances
title_short Predicting the protein targets for athletic performance-enhancing substances
title_sort predicting the protein targets for athletic performance-enhancing substances
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3701582/
https://www.ncbi.nlm.nih.gov/pubmed/23800040
http://dx.doi.org/10.1186/1758-2946-5-31
work_keys_str_mv AT mavridislazaros predictingtheproteintargetsforathleticperformanceenhancingsubstances
AT mitchelljohnbo predictingtheproteintargetsforathleticperformanceenhancingsubstances