Cargando…

Adaptations of data mining methodologies: a systematic literature review

The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used ‘as-is’ or ada...

Descripción completa

Detalles Bibliográficos
Autores principales: Plotnikova, Veronika, Dumas, Marlon, Milani, Fredrik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924527/
https://www.ncbi.nlm.nih.gov/pubmed/33816918
http://dx.doi.org/10.7717/peerj-cs.267
_version_ 1783659108531961856
author Plotnikova, Veronika
Dumas, Marlon
Milani, Fredrik
author_facet Plotnikova, Veronika
Dumas, Marlon
Milani, Fredrik
author_sort Plotnikova, Veronika
collection PubMed
description The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used ‘as-is’ or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and ‘grey’ publications. We find that data mining methodologies are primarily applied ‘as-is’. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps.
format Online
Article
Text
id pubmed-7924527
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79245272021-04-02 Adaptations of data mining methodologies: a systematic literature review Plotnikova, Veronika Dumas, Marlon Milani, Fredrik PeerJ Comput Sci Data Mining and Machine Learning The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used ‘as-is’ or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and ‘grey’ publications. We find that data mining methodologies are primarily applied ‘as-is’. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps. PeerJ Inc. 2020-05-25 /pmc/articles/PMC7924527/ /pubmed/33816918 http://dx.doi.org/10.7717/peerj-cs.267 Text en © 2020 Plotnikova et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Plotnikova, Veronika
Dumas, Marlon
Milani, Fredrik
Adaptations of data mining methodologies: a systematic literature review
title Adaptations of data mining methodologies: a systematic literature review
title_full Adaptations of data mining methodologies: a systematic literature review
title_fullStr Adaptations of data mining methodologies: a systematic literature review
title_full_unstemmed Adaptations of data mining methodologies: a systematic literature review
title_short Adaptations of data mining methodologies: a systematic literature review
title_sort adaptations of data mining methodologies: a systematic literature review
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924527/
https://www.ncbi.nlm.nih.gov/pubmed/33816918
http://dx.doi.org/10.7717/peerj-cs.267
work_keys_str_mv AT plotnikovaveronika adaptationsofdataminingmethodologiesasystematicliteraturereview
AT dumasmarlon adaptationsofdataminingmethodologiesasystematicliteraturereview
AT milanifredrik adaptationsofdataminingmethodologiesasystematicliteraturereview