Cargando…

Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets

BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior...

Descripción completa

Detalles Bibliográficos
Autores principales: van Westen, Gerard JP, Swier, Remco F, Wegner, Jörg K, IJzerman, Adriaan P, van Vlijmen, Herman WT, Bender, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848949/
https://www.ncbi.nlm.nih.gov/pubmed/24059694
http://dx.doi.org/10.1186/1758-2946-5-41
_version_ 1782293852933062656
author van Westen, Gerard JP
Swier, Remco F
Wegner, Jörg K
IJzerman, Adriaan P
van Vlijmen, Herman WT
Bender, Andreas
author_facet van Westen, Gerard JP
Swier, Remco F
Wegner, Jörg K
IJzerman, Adriaan P
van Vlijmen, Herman WT
Bender, Andreas
author_sort van Westen, Gerard JP
collection PubMed
description BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). RESULTS: In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. CONCLUSION: In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.
format Online
Article
Text
id pubmed-3848949
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38489492013-12-04 Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets van Westen, Gerard JP Swier, Remco F Wegner, Jörg K IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas J Cheminform Research Article BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). RESULTS: In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. CONCLUSION: In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior. BioMed Central 2013-09-23 /pmc/articles/PMC3848949/ /pubmed/24059694 http://dx.doi.org/10.1186/1758-2946-5-41 Text en Copyright © 2013 van Westen et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
van Westen, Gerard JP
Swier, Remco F
Wegner, Jörg K
IJzerman, Adriaan P
van Vlijmen, Herman WT
Bender, Andreas
Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
title Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
title_full Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
title_fullStr Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
title_full_unstemmed Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
title_short Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
title_sort benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848949/
https://www.ncbi.nlm.nih.gov/pubmed/24059694
http://dx.doi.org/10.1186/1758-2946-5-41
work_keys_str_mv AT vanwestengerardjp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets
AT swierremcof benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets
AT wegnerjorgk benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets
AT ijzermanadriaanp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets
AT vanvlijmenhermanwt benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets
AT benderandreas benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets