Cargando…

A Machine Learning Approach to Predictive Modelling of Student Performance

Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student perfor...

Descripción completa

Detalles Bibliográficos
Autores principales: Ng, Hu, bin Mohd Azha, Azmin Alias, Yap, Timothy Tzen Vun, Goh, Vik Tor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194521/
https://www.ncbi.nlm.nih.gov/pubmed/35719314
http://dx.doi.org/10.12688/f1000research.73180.2
_version_ 1784726747503132672
author Ng, Hu
bin Mohd Azha, Azmin Alias
Yap, Timothy Tzen Vun
Goh, Vik Tor
author_facet Ng, Hu
bin Mohd Azha, Azmin Alias
Yap, Timothy Tzen Vun
Goh, Vik Tor
author_sort Ng, Hu
collection PubMed
description Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student performance from a data mining approach. This study presents a data mining approach in identify significant factors and predict student performance, based on two datasets collected from two secondary schools in Portugal. Methods – In this study, two datasets  are augmented to increase the sample size by merging them.  Following that, data pre-processing is performed and the features are normalized with linear scaling to avoid bias on heavy weighted attributes.  The selected features are then assigned into four groups comprising of student background, lifestyle, history of grades and all features. Next, Boruta feature selection is performed to remove irrelevant features. Finally, the classification models of Support Vector Machine (SVM) , Naïve Bayes (NB) , and Multilayer Perceptron (MLP)  origins are designed and their performances evaluated. Results - The models were trained and evaluated on an integrated dataset comprising 1044 student records with 33 features, after feature selection. The classification was performed with SVM, NB and MLP with 60-40 and 50-50 train-test splits and 10-fold cross validation. GridSearchCV was applied to perform hyperparameter tuning. The performance metrics were accuracy, precision, recall and F1-Score. SVM obtained the highest accuracy with scores of 77%, 80%, 91% and 90% on background, lifestyle, history of grades and all features respectively in 50-50 train-test splits for binary levels classification . SVM also obtained highest accuracy for five levels  classification  with 39%, 38%, 73% and 71% for the four categories respectively. The results show that the history of grades form significant influence on the student performance.
format Online
Article
Text
id pubmed-9194521
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-91945212022-06-16 A Machine Learning Approach to Predictive Modelling of Student Performance Ng, Hu bin Mohd Azha, Azmin Alias Yap, Timothy Tzen Vun Goh, Vik Tor F1000Res Research Article Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student performance from a data mining approach. This study presents a data mining approach in identify significant factors and predict student performance, based on two datasets collected from two secondary schools in Portugal. Methods – In this study, two datasets  are augmented to increase the sample size by merging them.  Following that, data pre-processing is performed and the features are normalized with linear scaling to avoid bias on heavy weighted attributes.  The selected features are then assigned into four groups comprising of student background, lifestyle, history of grades and all features. Next, Boruta feature selection is performed to remove irrelevant features. Finally, the classification models of Support Vector Machine (SVM) , Naïve Bayes (NB) , and Multilayer Perceptron (MLP)  origins are designed and their performances evaluated. Results - The models were trained and evaluated on an integrated dataset comprising 1044 student records with 33 features, after feature selection. The classification was performed with SVM, NB and MLP with 60-40 and 50-50 train-test splits and 10-fold cross validation. GridSearchCV was applied to perform hyperparameter tuning. The performance metrics were accuracy, precision, recall and F1-Score. SVM obtained the highest accuracy with scores of 77%, 80%, 91% and 90% on background, lifestyle, history of grades and all features respectively in 50-50 train-test splits for binary levels classification . SVM also obtained highest accuracy for five levels  classification  with 39%, 38%, 73% and 71% for the four categories respectively. The results show that the history of grades form significant influence on the student performance. F1000 Research Limited 2022-05-23 /pmc/articles/PMC9194521/ /pubmed/35719314 http://dx.doi.org/10.12688/f1000research.73180.2 Text en Copyright: © 2022 Ng H et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ng, Hu
bin Mohd Azha, Azmin Alias
Yap, Timothy Tzen Vun
Goh, Vik Tor
A Machine Learning Approach to Predictive Modelling of Student Performance
title A Machine Learning Approach to Predictive Modelling of Student Performance
title_full A Machine Learning Approach to Predictive Modelling of Student Performance
title_fullStr A Machine Learning Approach to Predictive Modelling of Student Performance
title_full_unstemmed A Machine Learning Approach to Predictive Modelling of Student Performance
title_short A Machine Learning Approach to Predictive Modelling of Student Performance
title_sort machine learning approach to predictive modelling of student performance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194521/
https://www.ncbi.nlm.nih.gov/pubmed/35719314
http://dx.doi.org/10.12688/f1000research.73180.2
work_keys_str_mv AT nghu amachinelearningapproachtopredictivemodellingofstudentperformance
AT binmohdazhaazminalias amachinelearningapproachtopredictivemodellingofstudentperformance
AT yaptimothytzenvun amachinelearningapproachtopredictivemodellingofstudentperformance
AT gohviktor amachinelearningapproachtopredictivemodellingofstudentperformance
AT nghu machinelearningapproachtopredictivemodellingofstudentperformance
AT binmohdazhaazminalias machinelearningapproachtopredictivemodellingofstudentperformance
AT yaptimothytzenvun machinelearningapproachtopredictivemodellingofstudentperformance
AT gohviktor machinelearningapproachtopredictivemodellingofstudentperformance