Cargando…

A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases

It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50 % or up to 70 % of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particu...

Descripción completa

Detalles Bibliográficos
Autores principales: Pérez, Joaquín, Iturbide, Emmanuel, Olivares, Víctor, Hidalgo, Miguel, Martínez, Alicia, Almanza, Nelva
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4575356/
https://www.ncbi.nlm.nih.gov/pubmed/26385549
http://dx.doi.org/10.1007/s10916-015-0312-5
_version_ 1782390761923280896
author Pérez, Joaquín
Iturbide, Emmanuel
Olivares, Víctor
Hidalgo, Miguel
Martínez, Alicia
Almanza, Nelva
author_facet Pérez, Joaquín
Iturbide, Emmanuel
Olivares, Víctor
Hidalgo, Miguel
Martínez, Alicia
Almanza, Nelva
author_sort Pérez, Joaquín
collection PubMed
description It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50 % or up to 70 % of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.
format Online
Article
Text
id pubmed-4575356
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-45753562015-09-23 A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases Pérez, Joaquín Iturbide, Emmanuel Olivares, Víctor Hidalgo, Miguel Martínez, Alicia Almanza, Nelva J Med Syst Systems-Level Quality Improvement It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50 % or up to 70 % of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico. Springer US 2015-09-18 2015 /pmc/articles/PMC4575356/ /pubmed/26385549 http://dx.doi.org/10.1007/s10916-015-0312-5 Text en © The Author(s) 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Systems-Level Quality Improvement
Pérez, Joaquín
Iturbide, Emmanuel
Olivares, Víctor
Hidalgo, Miguel
Martínez, Alicia
Almanza, Nelva
A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
title A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
title_full A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
title_fullStr A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
title_full_unstemmed A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
title_short A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
title_sort data preparation methodology in data mining applied to mortality population databases
topic Systems-Level Quality Improvement
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4575356/
https://www.ncbi.nlm.nih.gov/pubmed/26385549
http://dx.doi.org/10.1007/s10916-015-0312-5
work_keys_str_mv AT perezjoaquin adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT iturbideemmanuel adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT olivaresvictor adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT hidalgomiguel adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT martinezalicia adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT almanzanelva adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT perezjoaquin datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT iturbideemmanuel datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT olivaresvictor datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT hidalgomiguel datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT martinezalicia datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases
AT almanzanelva datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases