Cargando…
A Machine Learning Approach to Predictive Modelling of Student Performance
Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student perfor...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194521/ https://www.ncbi.nlm.nih.gov/pubmed/35719314 http://dx.doi.org/10.12688/f1000research.73180.2 |
_version_ | 1784726747503132672 |
---|---|
author | Ng, Hu bin Mohd Azha, Azmin Alias Yap, Timothy Tzen Vun Goh, Vik Tor |
author_facet | Ng, Hu bin Mohd Azha, Azmin Alias Yap, Timothy Tzen Vun Goh, Vik Tor |
author_sort | Ng, Hu |
collection | PubMed |
description | Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student performance from a data mining approach. This study presents a data mining approach in identify significant factors and predict student performance, based on two datasets collected from two secondary schools in Portugal. Methods – In this study, two datasets are augmented to increase the sample size by merging them. Following that, data pre-processing is performed and the features are normalized with linear scaling to avoid bias on heavy weighted attributes. The selected features are then assigned into four groups comprising of student background, lifestyle, history of grades and all features. Next, Boruta feature selection is performed to remove irrelevant features. Finally, the classification models of Support Vector Machine (SVM) , Naïve Bayes (NB) , and Multilayer Perceptron (MLP) origins are designed and their performances evaluated. Results - The models were trained and evaluated on an integrated dataset comprising 1044 student records with 33 features, after feature selection. The classification was performed with SVM, NB and MLP with 60-40 and 50-50 train-test splits and 10-fold cross validation. GridSearchCV was applied to perform hyperparameter tuning. The performance metrics were accuracy, precision, recall and F1-Score. SVM obtained the highest accuracy with scores of 77%, 80%, 91% and 90% on background, lifestyle, history of grades and all features respectively in 50-50 train-test splits for binary levels classification . SVM also obtained highest accuracy for five levels classification with 39%, 38%, 73% and 71% for the four categories respectively. The results show that the history of grades form significant influence on the student performance. |
format | Online Article Text |
id | pubmed-9194521 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-91945212022-06-16 A Machine Learning Approach to Predictive Modelling of Student Performance Ng, Hu bin Mohd Azha, Azmin Alias Yap, Timothy Tzen Vun Goh, Vik Tor F1000Res Research Article Background - Many factors affect student performance such as the individual’s background, habits, absenteeism and social activities. Using these factors, corrective actions can be determined to improve their performance. This study looks into the effects of these factors in predicting student performance from a data mining approach. This study presents a data mining approach in identify significant factors and predict student performance, based on two datasets collected from two secondary schools in Portugal. Methods – In this study, two datasets are augmented to increase the sample size by merging them. Following that, data pre-processing is performed and the features are normalized with linear scaling to avoid bias on heavy weighted attributes. The selected features are then assigned into four groups comprising of student background, lifestyle, history of grades and all features. Next, Boruta feature selection is performed to remove irrelevant features. Finally, the classification models of Support Vector Machine (SVM) , Naïve Bayes (NB) , and Multilayer Perceptron (MLP) origins are designed and their performances evaluated. Results - The models were trained and evaluated on an integrated dataset comprising 1044 student records with 33 features, after feature selection. The classification was performed with SVM, NB and MLP with 60-40 and 50-50 train-test splits and 10-fold cross validation. GridSearchCV was applied to perform hyperparameter tuning. The performance metrics were accuracy, precision, recall and F1-Score. SVM obtained the highest accuracy with scores of 77%, 80%, 91% and 90% on background, lifestyle, history of grades and all features respectively in 50-50 train-test splits for binary levels classification . SVM also obtained highest accuracy for five levels classification with 39%, 38%, 73% and 71% for the four categories respectively. The results show that the history of grades form significant influence on the student performance. F1000 Research Limited 2022-05-23 /pmc/articles/PMC9194521/ /pubmed/35719314 http://dx.doi.org/10.12688/f1000research.73180.2 Text en Copyright: © 2022 Ng H et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Ng, Hu bin Mohd Azha, Azmin Alias Yap, Timothy Tzen Vun Goh, Vik Tor A Machine Learning Approach to Predictive Modelling of Student Performance |
title |
A Machine Learning Approach to Predictive Modelling of Student Performance
|
title_full |
A Machine Learning Approach to Predictive Modelling of Student Performance
|
title_fullStr |
A Machine Learning Approach to Predictive Modelling of Student Performance
|
title_full_unstemmed |
A Machine Learning Approach to Predictive Modelling of Student Performance
|
title_short |
A Machine Learning Approach to Predictive Modelling of Student Performance
|
title_sort | machine learning approach to predictive modelling of student performance |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194521/ https://www.ncbi.nlm.nih.gov/pubmed/35719314 http://dx.doi.org/10.12688/f1000research.73180.2 |
work_keys_str_mv | AT nghu amachinelearningapproachtopredictivemodellingofstudentperformance AT binmohdazhaazminalias amachinelearningapproachtopredictivemodellingofstudentperformance AT yaptimothytzenvun amachinelearningapproachtopredictivemodellingofstudentperformance AT gohviktor amachinelearningapproachtopredictivemodellingofstudentperformance AT nghu machinelearningapproachtopredictivemodellingofstudentperformance AT binmohdazhaazminalias machinelearningapproachtopredictivemodellingofstudentperformance AT yaptimothytzenvun machinelearningapproachtopredictivemodellingofstudentperformance AT gohviktor machinelearningapproachtopredictivemodellingofstudentperformance |