Cargando…
Visualising data science workflows to support third-party notebook comprehension: an empirical study
Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from re...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10034906/ https://www.ncbi.nlm.nih.gov/pubmed/36968214 http://dx.doi.org/10.1007/s10664-023-10289-9 |
_version_ | 1784911309384450048 |
---|---|
author | Ramasamy, Dhivyabharathi Sarasua, Cristina Bacchelli, Alberto Bernstein, Abraham |
author_facet | Ramasamy, Dhivyabharathi Sarasua, Cristina Bacchelli, Alberto Bernstein, Abraham |
author_sort | Ramasamy, Dhivyabharathi |
collection | PubMed |
description | Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called garden of forking paths in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension. |
format | Online Article Text |
id | pubmed-10034906 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-100349062023-03-23 Visualising data science workflows to support third-party notebook comprehension: an empirical study Ramasamy, Dhivyabharathi Sarasua, Cristina Bacchelli, Alberto Bernstein, Abraham Empir Softw Eng Article Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called garden of forking paths in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension. Springer US 2023-03-23 2023 /pmc/articles/PMC10034906/ /pubmed/36968214 http://dx.doi.org/10.1007/s10664-023-10289-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ramasamy, Dhivyabharathi Sarasua, Cristina Bacchelli, Alberto Bernstein, Abraham Visualising data science workflows to support third-party notebook comprehension: an empirical study |
title | Visualising data science workflows to support third-party notebook comprehension: an empirical study |
title_full | Visualising data science workflows to support third-party notebook comprehension: an empirical study |
title_fullStr | Visualising data science workflows to support third-party notebook comprehension: an empirical study |
title_full_unstemmed | Visualising data science workflows to support third-party notebook comprehension: an empirical study |
title_short | Visualising data science workflows to support third-party notebook comprehension: an empirical study |
title_sort | visualising data science workflows to support third-party notebook comprehension: an empirical study |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10034906/ https://www.ncbi.nlm.nih.gov/pubmed/36968214 http://dx.doi.org/10.1007/s10664-023-10289-9 |
work_keys_str_mv | AT ramasamydhivyabharathi visualisingdatascienceworkflowstosupportthirdpartynotebookcomprehensionanempiricalstudy AT sarasuacristina visualisingdatascienceworkflowstosupportthirdpartynotebookcomprehensionanempiricalstudy AT bacchellialberto visualisingdatascienceworkflowstosupportthirdpartynotebookcomprehensionanempiricalstudy AT bernsteinabraham visualisingdatascienceworkflowstosupportthirdpartynotebookcomprehensionanempiricalstudy |