Cargando…

Feature optimization in high dimensional chemical space: statistical and data mining solutions

OBJECTIVES: The primary goal of this experiment is to prioritize molecular descriptors that control the activity of active molecules that could reduce the dimensionality produced during the virtual screening process. It also aims to: (1) develop a methodology for sampling large datasets and the stat...

Descripción completa

Detalles Bibliográficos
Autores principales: K. R., Jinuraj, M., Rakhila, M., Dhanalakshmi, R., Sajeev, Gad, Akshata, K., Jayan, P., Muhammed Iqbal, Manuel, Andrew Titus, U. C., Abdul Jaleel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044099/
https://www.ncbi.nlm.nih.gov/pubmed/30001749
http://dx.doi.org/10.1186/s13104-018-3535-y
_version_ 1783339414827565056
author K. R., Jinuraj
M., Rakhila
M., Dhanalakshmi
R., Sajeev
Gad, Akshata
K., Jayan
P., Muhammed Iqbal
Manuel, Andrew Titus
U. C., Abdul Jaleel
author_facet K. R., Jinuraj
M., Rakhila
M., Dhanalakshmi
R., Sajeev
Gad, Akshata
K., Jayan
P., Muhammed Iqbal
Manuel, Andrew Titus
U. C., Abdul Jaleel
author_sort K. R., Jinuraj
collection PubMed
description OBJECTIVES: The primary goal of this experiment is to prioritize molecular descriptors that control the activity of active molecules that could reduce the dimensionality produced during the virtual screening process. It also aims to: (1) develop a methodology for sampling large datasets and the statistical verification of the sampling process, (2) apply screening filter to detect molecules with polypharmacological or promiscuous activity. RESULTS: Sampling from large a dataset and its verification were done by applying Z-test. Molecular descriptors were prioritized using principal component analysis (PCA) by eliminating the least influencing ones. The original dimensions were reduced to one-twelfth by the application of PCA. There was a significant improvement in statistical parameter values of virtual screening model which in turn resulted in better screening results. Further improvement of screened results was done by applying Eli Lilly MedChem rules filter that removed molecules with polypharmacological or promiscuous activity. It was also shown that similarities in the activity of compounds were due to the molecular descriptors which were not apparent in prima facie structural studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13104-018-3535-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6044099
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60440992018-07-16 Feature optimization in high dimensional chemical space: statistical and data mining solutions K. R., Jinuraj M., Rakhila M., Dhanalakshmi R., Sajeev Gad, Akshata K., Jayan P., Muhammed Iqbal Manuel, Andrew Titus U. C., Abdul Jaleel BMC Res Notes Research Note OBJECTIVES: The primary goal of this experiment is to prioritize molecular descriptors that control the activity of active molecules that could reduce the dimensionality produced during the virtual screening process. It also aims to: (1) develop a methodology for sampling large datasets and the statistical verification of the sampling process, (2) apply screening filter to detect molecules with polypharmacological or promiscuous activity. RESULTS: Sampling from large a dataset and its verification were done by applying Z-test. Molecular descriptors were prioritized using principal component analysis (PCA) by eliminating the least influencing ones. The original dimensions were reduced to one-twelfth by the application of PCA. There was a significant improvement in statistical parameter values of virtual screening model which in turn resulted in better screening results. Further improvement of screened results was done by applying Eli Lilly MedChem rules filter that removed molecules with polypharmacological or promiscuous activity. It was also shown that similarities in the activity of compounds were due to the molecular descriptors which were not apparent in prima facie structural studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13104-018-3535-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-07-13 /pmc/articles/PMC6044099/ /pubmed/30001749 http://dx.doi.org/10.1186/s13104-018-3535-y Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Note
K. R., Jinuraj
M., Rakhila
M., Dhanalakshmi
R., Sajeev
Gad, Akshata
K., Jayan
P., Muhammed Iqbal
Manuel, Andrew Titus
U. C., Abdul Jaleel
Feature optimization in high dimensional chemical space: statistical and data mining solutions
title Feature optimization in high dimensional chemical space: statistical and data mining solutions
title_full Feature optimization in high dimensional chemical space: statistical and data mining solutions
title_fullStr Feature optimization in high dimensional chemical space: statistical and data mining solutions
title_full_unstemmed Feature optimization in high dimensional chemical space: statistical and data mining solutions
title_short Feature optimization in high dimensional chemical space: statistical and data mining solutions
title_sort feature optimization in high dimensional chemical space: statistical and data mining solutions
topic Research Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044099/
https://www.ncbi.nlm.nih.gov/pubmed/30001749
http://dx.doi.org/10.1186/s13104-018-3535-y
work_keys_str_mv AT krjinuraj featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT mrakhila featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT mdhanalakshmi featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT rsajeev featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT gadakshata featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT kjayan featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT pmuhammediqbal featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT manuelandrewtitus featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions
AT ucabduljaleel featureoptimizationinhighdimensionalchemicalspacestatisticalanddataminingsolutions