Cargando…

Context-based preprocessing of molecular docking data

BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Winck, Ana T, Machado, Karina S, de Souza, Osmar Norberto, Ruiz, Duncan D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909228/ https://www.ncbi.nlm.nih.gov/pubmed/24564276 http://dx.doi.org/10.1186/1471-2164-14-S6-S6

_version_	1782301808081764352
author	Winck, Ana T Machado, Karina S de Souza, Osmar Norberto Ruiz, Duncan D
author_facet	Winck, Ana T Machado, Karina S de Souza, Osmar Norberto Ruiz, Duncan D
author_sort	Winck, Ana T
collection	PubMed
description	BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. RESULTS: We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. CONCLUSIONS: Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.
format	Online Article Text
id	pubmed-3909228
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39092282014-02-13 Context-based preprocessing of molecular docking data Winck, Ana T Machado, Karina S de Souza, Osmar Norberto Ruiz, Duncan D BMC Genomics Research BACKGROUND: Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. RESULTS: We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. CONCLUSIONS: Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models. BioMed Central 2013-10-25 /pmc/articles/PMC3909228/ /pubmed/24564276 http://dx.doi.org/10.1186/1471-2164-14-S6-S6 Text en Copyright © 2013 Winck et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Winck, Ana T Machado, Karina S de Souza, Osmar Norberto Ruiz, Duncan D Context-based preprocessing of molecular docking data
title	Context-based preprocessing of molecular docking data
title_full	Context-based preprocessing of molecular docking data
title_fullStr	Context-based preprocessing of molecular docking data
title_full_unstemmed	Context-based preprocessing of molecular docking data
title_short	Context-based preprocessing of molecular docking data
title_sort	context-based preprocessing of molecular docking data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909228/ https://www.ncbi.nlm.nih.gov/pubmed/24564276 http://dx.doi.org/10.1186/1471-2164-14-S6-S6
work_keys_str_mv	AT winckanat contextbasedpreprocessingofmoleculardockingdata AT machadokarinas contextbasedpreprocessingofmoleculardockingdata AT desouzaosmarnorberto contextbasedpreprocessingofmoleculardockingdata AT ruizduncand contextbasedpreprocessingofmoleculardockingdata

Context-based preprocessing of molecular docking data

Ejemplares similares