Cargando…

GrimoireLab: A toolset for software development analytics

BACKGROUND: After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. G...

Descripción completa

Detalles Bibliográficos
Autores principales: Dueñas, Santiago, Cosentino, Valerio, Gonzalez-Barahona, Jesus M., del Castillo San Felix, Alvaro, Izquierdo-Cortazar, Daniel, Cañas-Díaz, Luis, Pérez García-Plaza, Alberto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8279145/
https://www.ncbi.nlm.nih.gov/pubmed/34307858
http://dx.doi.org/10.7717/peerj-cs.601
_version_ 1783722396776136704
author Dueñas, Santiago
Cosentino, Valerio
Gonzalez-Barahona, Jesus M.
del Castillo San Felix, Alvaro
Izquierdo-Cortazar, Daniel
Cañas-Díaz, Luis
Pérez García-Plaza, Alberto
author_facet Dueñas, Santiago
Cosentino, Valerio
Gonzalez-Barahona, Jesus M.
del Castillo San Felix, Alvaro
Izquierdo-Cortazar, Daniel
Cañas-Díaz, Luis
Pérez García-Plaza, Alberto
author_sort Dueñas, Santiago
collection PubMed
description BACKGROUND: After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. GOAL: To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. METHOD: Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. RESULTS: GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. CONCLUSIONS: We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.
format Online
Article
Text
id pubmed-8279145
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-82791452021-07-22 GrimoireLab: A toolset for software development analytics Dueñas, Santiago Cosentino, Valerio Gonzalez-Barahona, Jesus M. del Castillo San Felix, Alvaro Izquierdo-Cortazar, Daniel Cañas-Díaz, Luis Pérez García-Plaza, Alberto PeerJ Comput Sci Data Science BACKGROUND: After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. GOAL: To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. METHOD: Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. RESULTS: GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. CONCLUSIONS: We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results. PeerJ Inc. 2021-07-09 /pmc/articles/PMC8279145/ /pubmed/34307858 http://dx.doi.org/10.7717/peerj-cs.601 Text en © 2021 Dueñas et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Science
Dueñas, Santiago
Cosentino, Valerio
Gonzalez-Barahona, Jesus M.
del Castillo San Felix, Alvaro
Izquierdo-Cortazar, Daniel
Cañas-Díaz, Luis
Pérez García-Plaza, Alberto
GrimoireLab: A toolset for software development analytics
title GrimoireLab: A toolset for software development analytics
title_full GrimoireLab: A toolset for software development analytics
title_fullStr GrimoireLab: A toolset for software development analytics
title_full_unstemmed GrimoireLab: A toolset for software development analytics
title_short GrimoireLab: A toolset for software development analytics
title_sort grimoirelab: a toolset for software development analytics
topic Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8279145/
https://www.ncbi.nlm.nih.gov/pubmed/34307858
http://dx.doi.org/10.7717/peerj-cs.601
work_keys_str_mv AT duenassantiago grimoirelabatoolsetforsoftwaredevelopmentanalytics
AT cosentinovalerio grimoirelabatoolsetforsoftwaredevelopmentanalytics
AT gonzalezbarahonajesusm grimoirelabatoolsetforsoftwaredevelopmentanalytics
AT delcastillosanfelixalvaro grimoirelabatoolsetforsoftwaredevelopmentanalytics
AT izquierdocortazardaniel grimoirelabatoolsetforsoftwaredevelopmentanalytics
AT canasdiazluis grimoirelabatoolsetforsoftwaredevelopmentanalytics
AT perezgarciaplazaalberto grimoirelabatoolsetforsoftwaredevelopmentanalytics