Cargando…
Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets
BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848949/ https://www.ncbi.nlm.nih.gov/pubmed/24059694 http://dx.doi.org/10.1186/1758-2946-5-41 |
_version_ | 1782293852933062656 |
---|---|
author | van Westen, Gerard JP Swier, Remco F Wegner, Jörg K IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas |
author_facet | van Westen, Gerard JP Swier, Remco F Wegner, Jörg K IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas |
author_sort | van Westen, Gerard JP |
collection | PubMed |
description | BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). RESULTS: In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. CONCLUSION: In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior. |
format | Online Article Text |
id | pubmed-3848949 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38489492013-12-04 Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets van Westen, Gerard JP Swier, Remco F Wegner, Jörg K IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas J Cheminform Research Article BACKGROUND: While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). RESULTS: In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. CONCLUSION: In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior. BioMed Central 2013-09-23 /pmc/articles/PMC3848949/ /pubmed/24059694 http://dx.doi.org/10.1186/1758-2946-5-41 Text en Copyright © 2013 van Westen et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article van Westen, Gerard JP Swier, Remco F Wegner, Jörg K IJzerman, Adriaan P van Vlijmen, Herman WT Bender, Andreas Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
title | Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
title_full | Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
title_fullStr | Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
title_full_unstemmed | Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
title_short | Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
title_sort | benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848949/ https://www.ncbi.nlm.nih.gov/pubmed/24059694 http://dx.doi.org/10.1186/1758-2946-5-41 |
work_keys_str_mv | AT vanwestengerardjp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets AT swierremcof benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets AT wegnerjorgk benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets AT ijzermanadriaanp benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets AT vanvlijmenhermanwt benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets AT benderandreas benchmarkingofproteindescriptorsetsinproteochemometricmodelingpart1comparativestudyof13aminoaciddescriptorsets |