Cargando…
A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases
It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50 % or up to 70 % of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particu...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4575356/ https://www.ncbi.nlm.nih.gov/pubmed/26385549 http://dx.doi.org/10.1007/s10916-015-0312-5 |
_version_ | 1782390761923280896 |
---|---|
author | Pérez, Joaquín Iturbide, Emmanuel Olivares, Víctor Hidalgo, Miguel Martínez, Alicia Almanza, Nelva |
author_facet | Pérez, Joaquín Iturbide, Emmanuel Olivares, Víctor Hidalgo, Miguel Martínez, Alicia Almanza, Nelva |
author_sort | Pérez, Joaquín |
collection | PubMed |
description | It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50 % or up to 70 % of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico. |
format | Online Article Text |
id | pubmed-4575356 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-45753562015-09-23 A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases Pérez, Joaquín Iturbide, Emmanuel Olivares, Víctor Hidalgo, Miguel Martínez, Alicia Almanza, Nelva J Med Syst Systems-Level Quality Improvement It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50 % or up to 70 % of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico. Springer US 2015-09-18 2015 /pmc/articles/PMC4575356/ /pubmed/26385549 http://dx.doi.org/10.1007/s10916-015-0312-5 Text en © The Author(s) 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Systems-Level Quality Improvement Pérez, Joaquín Iturbide, Emmanuel Olivares, Víctor Hidalgo, Miguel Martínez, Alicia Almanza, Nelva A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases |
title | A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases |
title_full | A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases |
title_fullStr | A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases |
title_full_unstemmed | A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases |
title_short | A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases |
title_sort | data preparation methodology in data mining applied to mortality population databases |
topic | Systems-Level Quality Improvement |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4575356/ https://www.ncbi.nlm.nih.gov/pubmed/26385549 http://dx.doi.org/10.1007/s10916-015-0312-5 |
work_keys_str_mv | AT perezjoaquin adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT iturbideemmanuel adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT olivaresvictor adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT hidalgomiguel adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT martinezalicia adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT almanzanelva adatapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT perezjoaquin datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT iturbideemmanuel datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT olivaresvictor datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT hidalgomiguel datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT martinezalicia datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases AT almanzanelva datapreparationmethodologyindataminingappliedtomortalitypopulationdatabases |