Cargando…

Software reusability dataset based on static analysis metrics and reuse rate information

The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior effo...

Descripción completa

Detalles Bibliográficos
Autores principales: Papamichail, Michail D., Diamantopoulos, Themistoklis, Symeonidis, Andreas L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6838442/
https://www.ncbi.nlm.nih.gov/pubmed/31720337
http://dx.doi.org/10.1016/j.dib.2019.104687
_version_ 1783467224860721152
author Papamichail, Michail D.
Diamantopoulos, Themistoklis
Symeonidis, Andreas L.
author_facet Papamichail, Michail D.
Diamantopoulos, Themistoklis
Symeonidis, Andreas L.
author_sort Papamichail, Michail D.
collection PubMed
description The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled “Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information” [1].
format Online
Article
Text
id pubmed-6838442
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-68384422019-11-12 Software reusability dataset based on static analysis metrics and reuse rate information Papamichail, Michail D. Diamantopoulos, Themistoklis Symeonidis, Andreas L. Data Brief Computer Science The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled “Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information” [1]. Elsevier 2019-10-19 /pmc/articles/PMC6838442/ /pubmed/31720337 http://dx.doi.org/10.1016/j.dib.2019.104687 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Computer Science
Papamichail, Michail D.
Diamantopoulos, Themistoklis
Symeonidis, Andreas L.
Software reusability dataset based on static analysis metrics and reuse rate information
title Software reusability dataset based on static analysis metrics and reuse rate information
title_full Software reusability dataset based on static analysis metrics and reuse rate information
title_fullStr Software reusability dataset based on static analysis metrics and reuse rate information
title_full_unstemmed Software reusability dataset based on static analysis metrics and reuse rate information
title_short Software reusability dataset based on static analysis metrics and reuse rate information
title_sort software reusability dataset based on static analysis metrics and reuse rate information
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6838442/
https://www.ncbi.nlm.nih.gov/pubmed/31720337
http://dx.doi.org/10.1016/j.dib.2019.104687
work_keys_str_mv AT papamichailmichaild softwarereusabilitydatasetbasedonstaticanalysismetricsandreuserateinformation
AT diamantopoulosthemistoklis softwarereusabilitydatasetbasedonstaticanalysismetricsandreuserateinformation
AT symeonidisandreasl softwarereusabilitydatasetbasedonstaticanalysismetricsandreuserateinformation