Cargando…

Context-based preprocessing of molecular docking data

BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generati...

Descripción completa

Detalles Bibliográficos
Autores principales: Winck, Ana T, Machado, Karina S, de Souza, Osmar Norberto, Ruiz, Duncan D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909228/
https://www.ncbi.nlm.nih.gov/pubmed/24564276
http://dx.doi.org/10.1186/1471-2164-14-S6-S6
_version_ 1782301808081764352
author Winck, Ana T
Machado, Karina S
de Souza, Osmar Norberto
Ruiz, Duncan D
author_facet Winck, Ana T
Machado, Karina S
de Souza, Osmar Norberto
Ruiz, Duncan D
author_sort Winck, Ana T
collection PubMed
description BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. RESULTS: We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. CONCLUSIONS: Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.
format Online
Article
Text
id pubmed-3909228
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39092282014-02-13 Context-based preprocessing of molecular docking data Winck, Ana T Machado, Karina S de Souza, Osmar Norberto Ruiz, Duncan D BMC Genomics Research BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. RESULTS: We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. CONCLUSIONS: Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models. BioMed Central 2013-10-25 /pmc/articles/PMC3909228/ /pubmed/24564276 http://dx.doi.org/10.1186/1471-2164-14-S6-S6 Text en Copyright © 2013 Winck et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Winck, Ana T
Machado, Karina S
de Souza, Osmar Norberto
Ruiz, Duncan D
Context-based preprocessing of molecular docking data
title Context-based preprocessing of molecular docking data
title_full Context-based preprocessing of molecular docking data
title_fullStr Context-based preprocessing of molecular docking data
title_full_unstemmed Context-based preprocessing of molecular docking data
title_short Context-based preprocessing of molecular docking data
title_sort context-based preprocessing of molecular docking data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909228/
https://www.ncbi.nlm.nih.gov/pubmed/24564276
http://dx.doi.org/10.1186/1471-2164-14-S6-S6
work_keys_str_mv AT winckanat contextbasedpreprocessingofmoleculardockingdata
AT machadokarinas contextbasedpreprocessingofmoleculardockingdata
AT desouzaosmarnorberto contextbasedpreprocessingofmoleculardockingdata
AT ruizduncand contextbasedpreprocessingofmoleculardockingdata