Cargando…

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

BACKGROUND: While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of esta...

Descripción completa

Detalles Bibliográficos
Autores principales:	van Westen, Gerard JP, Swier, Remco F, Cortes-Ciriano, Isidro, Wegner, Jörg K, Overington, John P, IJzerman, Adriaan P, van Vlijmen, Herman WT, Bender, Andreas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015169/ https://www.ncbi.nlm.nih.gov/pubmed/24059743 http://dx.doi.org/10.1186/1758-2946-5-42

_version_	1782315293009248256
author	van Westen, Gerard JP Swier, Remco F Cortes-Ciriano, Isidro Wegner, Jörg K Overington, John P IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas
author_facet	van Westen, Gerard JP Swier, Remco F Cortes-Ciriano, Isidro Wegner, Jörg K Overington, John P IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas
author_sort	van Westen, Gerard JP
collection	PubMed
description	BACKGROUND: While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. RESULTS: The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. CONCLUSIONS: While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.
format	Online Article Text
id	pubmed-4015169
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40151692014-05-10 Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets van Westen, Gerard JP Swier, Remco F Cortes-Ciriano, Isidro Wegner, Jörg K Overington, John P IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas J Cheminform Research Article BACKGROUND: While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. RESULTS: The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. CONCLUSIONS: While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side. BioMed Central 2013-09-24 /pmc/articles/PMC4015169/ /pubmed/24059743 http://dx.doi.org/10.1186/1758-2946-5-42 Text en Copyright © 2013 van Westen et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article van Westen, Gerard JP Swier, Remco F Cortes-Ciriano, Isidro Wegner, Jörg K Overington, John P IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
title	Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
title_full	Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
title_fullStr	Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
title_full_unstemmed	Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
title_short	Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
title_sort	benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015169/ https://www.ncbi.nlm.nih.gov/pubmed/24059743 http://dx.doi.org/10.1186/1758-2946-5-42
work_keys_str_mv	AT vanwestengerardjp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT swierremcof benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT cortescirianoisidro benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT wegnerjorgk benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT overingtonjohnp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT ijzermanadriaanp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT vanvlijmenhermanwt benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets AT benderandreas benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart2modelingperformanceof13aminoaciddescriptorsets

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Ejemplares similares