Cargando…

Software Requirements Classification Using Machine Learning Algorithms

The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “W...

Descripción completa

Detalles Bibliográficos
Autores principales: Dias Canedo, Edna, Cordeiro Mendes, Bruno
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597130/
https://www.ncbi.nlm.nih.gov/pubmed/33286826
http://dx.doi.org/10.3390/e22091057
_version_ 1783602270313644032
author Dias Canedo, Edna
Cordeiro Mendes, Bruno
author_facet Dias Canedo, Edna
Cordeiro Mendes, Bruno
author_sort Dias Canedo, Edna
collection PubMed
description The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “Which works best (Bag of Words (BoW) vs. Term Frequency–Inverse Document Frequency (TF-IDF) vs. Chi Squared ([Formula: see text])) for classifying Software Requirements into Functional Requirements (FR) and Non-Functional Requirements (NF), and the sub-classes of Non-Functional Requirements?” and “Which Machine Learning Algorithm provides the best performance for the requirements classification task?”. The data used to perform the research was the PROMISE_exp, a recently made dataset that expands the already known PROMISE repository, a repository that contains labeled software requirements. All the documents from the database were cleaned with a set of normalization steps and the two feature extractions, and feature selection techniques used were BoW, TF-IDF and [Formula: see text] respectively. The algorithms used for classification were Logist Regression (LR), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB) and k-Nearest Neighbors (kNN). The novelty of our work is the data used to perform the experiment, the details of the steps used to reproduce the classification, and the comparison between BoW, TF-IDF and [Formula: see text] for this repository not having been covered by other studies. This work will serve as a reference for the software engineering community and will help other researchers to understand the requirement classification process. We noticed that the use of TF-IDF followed by the use of LR had a better classification result to differentiate requirements, with an F-measure of 0.91 in binary classification (tying with SVM in that case), 0.74 in NF classification and 0.78 in general classification. As future work we intend to compare more algorithms and new forms to improve the precision of our models.
format Online
Article
Text
id pubmed-7597130
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75971302020-11-09 Software Requirements Classification Using Machine Learning Algorithms Dias Canedo, Edna Cordeiro Mendes, Bruno Entropy (Basel) Article The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “Which works best (Bag of Words (BoW) vs. Term Frequency–Inverse Document Frequency (TF-IDF) vs. Chi Squared ([Formula: see text])) for classifying Software Requirements into Functional Requirements (FR) and Non-Functional Requirements (NF), and the sub-classes of Non-Functional Requirements?” and “Which Machine Learning Algorithm provides the best performance for the requirements classification task?”. The data used to perform the research was the PROMISE_exp, a recently made dataset that expands the already known PROMISE repository, a repository that contains labeled software requirements. All the documents from the database were cleaned with a set of normalization steps and the two feature extractions, and feature selection techniques used were BoW, TF-IDF and [Formula: see text] respectively. The algorithms used for classification were Logist Regression (LR), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB) and k-Nearest Neighbors (kNN). The novelty of our work is the data used to perform the experiment, the details of the steps used to reproduce the classification, and the comparison between BoW, TF-IDF and [Formula: see text] for this repository not having been covered by other studies. This work will serve as a reference for the software engineering community and will help other researchers to understand the requirement classification process. We noticed that the use of TF-IDF followed by the use of LR had a better classification result to differentiate requirements, with an F-measure of 0.91 in binary classification (tying with SVM in that case), 0.74 in NF classification and 0.78 in general classification. As future work we intend to compare more algorithms and new forms to improve the precision of our models. MDPI 2020-09-21 /pmc/articles/PMC7597130/ /pubmed/33286826 http://dx.doi.org/10.3390/e22091057 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dias Canedo, Edna
Cordeiro Mendes, Bruno
Software Requirements Classification Using Machine Learning Algorithms
title Software Requirements Classification Using Machine Learning Algorithms
title_full Software Requirements Classification Using Machine Learning Algorithms
title_fullStr Software Requirements Classification Using Machine Learning Algorithms
title_full_unstemmed Software Requirements Classification Using Machine Learning Algorithms
title_short Software Requirements Classification Using Machine Learning Algorithms
title_sort software requirements classification using machine learning algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597130/
https://www.ncbi.nlm.nih.gov/pubmed/33286826
http://dx.doi.org/10.3390/e22091057
work_keys_str_mv AT diascanedoedna softwarerequirementsclassificationusingmachinelearningalgorithms
AT cordeiromendesbruno softwarerequirementsclassificationusingmachinelearningalgorithms