Cargando…

Current approaches for executing big data science projects—a systematic literature review

There is an increasing number of big data science projects aiming to create value for organizations by improving decision making, streamlining costs or enhancing business processes. However, many of these projects fail to deliver the expected value. It has been observed that a key reason many data s...

Descripción completa

Detalles Bibliográficos
Autores principales: Saltz, Jeffrey S., Krasteva, Iva
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044260/
https://www.ncbi.nlm.nih.gov/pubmed/35494858
http://dx.doi.org/10.7717/peerj-cs.862
_version_ 1784695067235057664
author Saltz, Jeffrey S.
Krasteva, Iva
author_facet Saltz, Jeffrey S.
Krasteva, Iva
author_sort Saltz, Jeffrey S.
collection PubMed
description There is an increasing number of big data science projects aiming to create value for organizations by improving decision making, streamlining costs or enhancing business processes. However, many of these projects fail to deliver the expected value. It has been observed that a key reason many data science projects don’t succeed is not technical in nature, but rather, the process aspect of the project. The lack of established and mature methodologies for executing data science projects has been frequently noted as a reason for these project failures. To help move the field forward, this study presents a systematic review of research focused on the adoption of big data science process frameworks. The goal of the review was to identify (1) the key themes, with respect to current research on how teams execute data science projects, (2) the most common approaches regarding how data science projects are organized, managed and coordinated, (3) the activities involved in a data science projects life cycle, and (4) the implications for future research in this field. In short, the review identified 68 primary studies thematically classified in six categories. Two of the themes (workflow and agility) accounted for approximately 80% of the identified studies. The findings regarding workflow approaches consist mainly of adaptations to CRISP-DM (vs entirely new proposed methodologies). With respect to agile approaches, most of the studies only explored the conceptual benefits of using an agile approach in a data science project (vs actually evaluating an agile framework being used in a data science context). Hence, one finding from this research is that future research should explore how to best achieve the theorized benefits of agility. Another finding is the need to explore how to efficiently combine workflow and agile frameworks within a data science context to achieve a more comprehensive approach for project execution.
format Online
Article
Text
id pubmed-9044260
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-90442602022-04-28 Current approaches for executing big data science projects—a systematic literature review Saltz, Jeffrey S. Krasteva, Iva PeerJ Comput Sci Data Mining and Machine Learning There is an increasing number of big data science projects aiming to create value for organizations by improving decision making, streamlining costs or enhancing business processes. However, many of these projects fail to deliver the expected value. It has been observed that a key reason many data science projects don’t succeed is not technical in nature, but rather, the process aspect of the project. The lack of established and mature methodologies for executing data science projects has been frequently noted as a reason for these project failures. To help move the field forward, this study presents a systematic review of research focused on the adoption of big data science process frameworks. The goal of the review was to identify (1) the key themes, with respect to current research on how teams execute data science projects, (2) the most common approaches regarding how data science projects are organized, managed and coordinated, (3) the activities involved in a data science projects life cycle, and (4) the implications for future research in this field. In short, the review identified 68 primary studies thematically classified in six categories. Two of the themes (workflow and agility) accounted for approximately 80% of the identified studies. The findings regarding workflow approaches consist mainly of adaptations to CRISP-DM (vs entirely new proposed methodologies). With respect to agile approaches, most of the studies only explored the conceptual benefits of using an agile approach in a data science project (vs actually evaluating an agile framework being used in a data science context). Hence, one finding from this research is that future research should explore how to best achieve the theorized benefits of agility. Another finding is the need to explore how to efficiently combine workflow and agile frameworks within a data science context to achieve a more comprehensive approach for project execution. PeerJ Inc. 2022-02-21 /pmc/articles/PMC9044260/ /pubmed/35494858 http://dx.doi.org/10.7717/peerj-cs.862 Text en © 2022 Saltz and Krasteva https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Saltz, Jeffrey S.
Krasteva, Iva
Current approaches for executing big data science projects—a systematic literature review
title Current approaches for executing big data science projects—a systematic literature review
title_full Current approaches for executing big data science projects—a systematic literature review
title_fullStr Current approaches for executing big data science projects—a systematic literature review
title_full_unstemmed Current approaches for executing big data science projects—a systematic literature review
title_short Current approaches for executing big data science projects—a systematic literature review
title_sort current approaches for executing big data science projects—a systematic literature review
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044260/
https://www.ncbi.nlm.nih.gov/pubmed/35494858
http://dx.doi.org/10.7717/peerj-cs.862
work_keys_str_mv AT saltzjeffreys currentapproachesforexecutingbigdatascienceprojectsasystematicliteraturereview
AT krastevaiva currentapproachesforexecutingbigdatascienceprojectsasystematicliteraturereview