Cargando…

Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases

BACKGROUND: ‘Phylogenetic trees’ are commonly used for the analysis of chemogenomics datasets and to relate protein targets to each other, based on the (shared) bioactivities of their ligands. However, no real assessment as to the suitability of this representation has been performed yet in this are...

Descripción completa

Detalles Bibliográficos
Autores principales:	Paricharak, Shardul, Klenka, Tom, Augustin, Martin, Patel, Umesh A, Bender, Andreas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900467/ https://www.ncbi.nlm.nih.gov/pubmed/24330772 http://dx.doi.org/10.1186/1758-2946-5-49

_version_	1782300699907850240
author	Paricharak, Shardul Klenka, Tom Augustin, Martin Patel, Umesh A Bender, Andreas
author_facet	Paricharak, Shardul Klenka, Tom Augustin, Martin Patel, Umesh A Bender, Andreas
author_sort	Paricharak, Shardul
collection	PubMed
description	BACKGROUND: ‘Phylogenetic trees’ are commonly used for the analysis of chemogenomics datasets and to relate protein targets to each other, based on the (shared) bioactivities of their ligands. However, no real assessment as to the suitability of this representation has been performed yet in this area. We aimed to address this shortcoming in the current work, as exemplified by a kinase data set, given the importance of kinases in many diseases as well as the availability of large-scale datasets for analysis. In this work, we analyzed a dataset comprising 157 compounds, which have been tested at concentrations of 1 μM and 10 μM against a panel of 225 human protein kinases in full-matrix experiments, aiming to explain kinase promiscuity and selectivity against inhibitors. Compounds were described by chemical features, which were used to represent kinases (i.e. each kinase had an active set of features and an inactive set). RESULTS: Using this representation, a bioactivity-based classification was made of the kinome, which partially resembles previous sequence-based classifications, where particularly kinases from the TK, CDK, CLK and AGC branches cluster together. However, we were also able to show that in approximately 57% of cases, on average 6 kinase inhibitors exhibit activity against kinases which are located at a large distance in the sequence-based classification (at a relative distance of 0.6 – 0.8 on a scale from 0 to 1), but are correctly located closer to each other in our bioactivity-based tree (distance 0 – 0.4). Despite this improvement on sequence-based classification, also the bioactivity-based classification needed further attention: for approximately 80% of all analyzed kinases, kinases classified as neighbors according to the bioactivity-based classification also show high SAR similarity (i.e. a high fraction of shared active compounds and therefore, interaction with similar inhibitors). However, in the remaining ~20% of cases a clear relationship between kinase bioactivity profile similarity and shared active compounds could not be established, which is in agreement with previously published atypical SAR (such as for LCK, FGFR1, AKT2, DAPK1, TGFR1, MK12 and AKT1). CONCLUSIONS: In this work we were hence able to show that (1) targets (here kinases) with few shared activities are difficult to establish neighborhood relationships for, and (2) phylogenetic tree representations make implicit assumptions (i.e. that neighboring kinases exhibit similar interaction profiles with inhibitors) that are not always suitable for analyses of bioactivity space. While both points have been implicitly alluded to before, this is to the information of the authors the first study that explores both points on a comprehensive basis. Excluding kinases with few shared activities improved the situation greatly (the percentage of kinases for which no neighborhood relationship could be established dropped from 20% to only 4%). We can conclude that all of the above findings need to be taken into account when performing chemogenomics analyses, also for other target classes.
format	Online Article Text
id	pubmed-3900467
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39004672014-01-28 Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases Paricharak, Shardul Klenka, Tom Augustin, Martin Patel, Umesh A Bender, Andreas J Cheminform Research Article BACKGROUND: ‘Phylogenetic trees’ are commonly used for the analysis of chemogenomics datasets and to relate protein targets to each other, based on the (shared) bioactivities of their ligands. However, no real assessment as to the suitability of this representation has been performed yet in this area. We aimed to address this shortcoming in the current work, as exemplified by a kinase data set, given the importance of kinases in many diseases as well as the availability of large-scale datasets for analysis. In this work, we analyzed a dataset comprising 157 compounds, which have been tested at concentrations of 1 μM and 10 μM against a panel of 225 human protein kinases in full-matrix experiments, aiming to explain kinase promiscuity and selectivity against inhibitors. Compounds were described by chemical features, which were used to represent kinases (i.e. each kinase had an active set of features and an inactive set). RESULTS: Using this representation, a bioactivity-based classification was made of the kinome, which partially resembles previous sequence-based classifications, where particularly kinases from the TK, CDK, CLK and AGC branches cluster together. However, we were also able to show that in approximately 57% of cases, on average 6 kinase inhibitors exhibit activity against kinases which are located at a large distance in the sequence-based classification (at a relative distance of 0.6 – 0.8 on a scale from 0 to 1), but are correctly located closer to each other in our bioactivity-based tree (distance 0 – 0.4). Despite this improvement on sequence-based classification, also the bioactivity-based classification needed further attention: for approximately 80% of all analyzed kinases, kinases classified as neighbors according to the bioactivity-based classification also show high SAR similarity (i.e. a high fraction of shared active compounds and therefore, interaction with similar inhibitors). However, in the remaining ~20% of cases a clear relationship between kinase bioactivity profile similarity and shared active compounds could not be established, which is in agreement with previously published atypical SAR (such as for LCK, FGFR1, AKT2, DAPK1, TGFR1, MK12 and AKT1). CONCLUSIONS: In this work we were hence able to show that (1) targets (here kinases) with few shared activities are difficult to establish neighborhood relationships for, and (2) phylogenetic tree representations make implicit assumptions (i.e. that neighboring kinases exhibit similar interaction profiles with inhibitors) that are not always suitable for analyses of bioactivity space. While both points have been implicitly alluded to before, this is to the information of the authors the first study that explores both points on a comprehensive basis. Excluding kinases with few shared activities improved the situation greatly (the percentage of kinases for which no neighborhood relationship could be established dropped from 20% to only 4%). We can conclude that all of the above findings need to be taken into account when performing chemogenomics analyses, also for other target classes. BioMed Central 2013-12-13 /pmc/articles/PMC3900467/ /pubmed/24330772 http://dx.doi.org/10.1186/1758-2946-5-49 Text en Copyright © 2013 Paricharak et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Paricharak, Shardul Klenka, Tom Augustin, Martin Patel, Umesh A Bender, Andreas Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
title	Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
title_full	Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
title_fullStr	Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
title_full_unstemmed	Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
title_short	Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
title_sort	are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on kinases
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900467/ https://www.ncbi.nlm.nih.gov/pubmed/24330772 http://dx.doi.org/10.1186/1758-2946-5-49
work_keys_str_mv	AT paricharakshardul arephylogenetictreessuitableforchemogenomicsanalysesofbioactivitydatasetstheimportanceofsharedactivecompoundsandchoosingasuitabledataembeddingmethodasexemplifiedonkinases AT klenkatom arephylogenetictreessuitableforchemogenomicsanalysesofbioactivitydatasetstheimportanceofsharedactivecompoundsandchoosingasuitabledataembeddingmethodasexemplifiedonkinases AT augustinmartin arephylogenetictreessuitableforchemogenomicsanalysesofbioactivitydatasetstheimportanceofsharedactivecompoundsandchoosingasuitabledataembeddingmethodasexemplifiedonkinases AT patelumesha arephylogenetictreessuitableforchemogenomicsanalysesofbioactivitydatasetstheimportanceofsharedactivecompoundsandchoosingasuitabledataembeddingmethodasexemplifiedonkinases AT benderandreas arephylogenetictreessuitableforchemogenomicsanalysesofbioactivitydatasetstheimportanceofsharedactivecompoundsandchoosingasuitabledataembeddingmethodasexemplifiedonkinases

Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases

Ejemplares similares