Cargando…
Software reusability dataset based on static analysis metrics and reuse rate information
The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior effo...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6838442/ https://www.ncbi.nlm.nih.gov/pubmed/31720337 http://dx.doi.org/10.1016/j.dib.2019.104687 |
_version_ | 1783467224860721152 |
---|---|
author | Papamichail, Michail D. Diamantopoulos, Themistoklis Symeonidis, Andreas L. |
author_facet | Papamichail, Michail D. Diamantopoulos, Themistoklis Symeonidis, Andreas L. |
author_sort | Papamichail, Michail D. |
collection | PubMed |
description | The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled “Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information” [1]. |
format | Online Article Text |
id | pubmed-6838442 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-68384422019-11-12 Software reusability dataset based on static analysis metrics and reuse rate information Papamichail, Michail D. Diamantopoulos, Themistoklis Symeonidis, Andreas L. Data Brief Computer Science The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled “Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information” [1]. Elsevier 2019-10-19 /pmc/articles/PMC6838442/ /pubmed/31720337 http://dx.doi.org/10.1016/j.dib.2019.104687 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Computer Science Papamichail, Michail D. Diamantopoulos, Themistoklis Symeonidis, Andreas L. Software reusability dataset based on static analysis metrics and reuse rate information |
title | Software reusability dataset based on static analysis metrics and reuse rate information |
title_full | Software reusability dataset based on static analysis metrics and reuse rate information |
title_fullStr | Software reusability dataset based on static analysis metrics and reuse rate information |
title_full_unstemmed | Software reusability dataset based on static analysis metrics and reuse rate information |
title_short | Software reusability dataset based on static analysis metrics and reuse rate information |
title_sort | software reusability dataset based on static analysis metrics and reuse rate information |
topic | Computer Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6838442/ https://www.ncbi.nlm.nih.gov/pubmed/31720337 http://dx.doi.org/10.1016/j.dib.2019.104687 |
work_keys_str_mv | AT papamichailmichaild softwarereusabilitydatasetbasedonstaticanalysismetricsandreuserateinformation AT diamantopoulosthemistoklis softwarereusabilitydatasetbasedonstaticanalysismetricsandreuserateinformation AT symeonidisandreasl softwarereusabilitydatasetbasedonstaticanalysismetricsandreuserateinformation |