Cargando…

Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling

The performance of quantitative structure–activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kausar, Samina, Falcao, Andre O.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6539555/ https://www.ncbi.nlm.nih.gov/pubmed/31052325 http://dx.doi.org/10.3390/molecules24091698

_version_	1783422417147789312
author	Kausar, Samina Falcao, Andre O.
author_facet	Kausar, Samina Falcao, Andre O.
author_sort	Kausar, Samina
collection	PubMed
description	The performance of quantitative structure–activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprints types (MACCS, PubChem, FP2-based, Atom Pair, and ECFP4), and a graph matching approach (non-contiguous atom matching structure similarity; NAMS) in both vector space and metric space, were subjected to state-of-art machine learning methods that included different dimensionality reduction methods (feature selection and linear dimensionality reduction). Five distinct QSAR data sets were used for direct assessment and analysis. Results show that, in general, metric-space and vector-space representations are able to produce equivalent models, but there are significant differences between individual approaches. The NAMS-based similarity approach consistently outperformed most fingerprint representations in model quality, closely followed by Atom Pair fingerprints. To further verify these findings, the metric space-based models were fitted to the same data sets with the closest neighbors removed. These latter results further strengthened the above conclusions. The metric space graph-based approach appeared significantly superior to the other representations, albeit at a significant computational cost.
format	Online Article Text
id	pubmed-6539555
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-65395552019-05-31 Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling Kausar, Samina Falcao, Andre O. Molecules Article The performance of quantitative structure–activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprints types (MACCS, PubChem, FP2-based, Atom Pair, and ECFP4), and a graph matching approach (non-contiguous atom matching structure similarity; NAMS) in both vector space and metric space, were subjected to state-of-art machine learning methods that included different dimensionality reduction methods (feature selection and linear dimensionality reduction). Five distinct QSAR data sets were used for direct assessment and analysis. Results show that, in general, metric-space and vector-space representations are able to produce equivalent models, but there are significant differences between individual approaches. The NAMS-based similarity approach consistently outperformed most fingerprint representations in model quality, closely followed by Atom Pair fingerprints. To further verify these findings, the metric space-based models were fitted to the same data sets with the closest neighbors removed. These latter results further strengthened the above conclusions. The metric space graph-based approach appeared significantly superior to the other representations, albeit at a significant computational cost. MDPI 2019-04-30 /pmc/articles/PMC6539555/ /pubmed/31052325 http://dx.doi.org/10.3390/molecules24091698 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kausar, Samina Falcao, Andre O. Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
title	Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
title_full	Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
title_fullStr	Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
title_full_unstemmed	Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
title_short	Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
title_sort	analysis and comparison of vector space and metric space representations in qsar modeling
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6539555/ https://www.ncbi.nlm.nih.gov/pubmed/31052325 http://dx.doi.org/10.3390/molecules24091698
work_keys_str_mv	AT kausarsamina analysisandcomparisonofvectorspaceandmetricspacerepresentationsinqsarmodeling AT falcaoandreo analysisandcomparisonofvectorspaceandmetricspacerepresentationsinqsarmodeling

Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling

Ejemplares similares