Cargando…

Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile

Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting st...

Descripción completa

Detalles Bibliográficos
Autores principales:	Palacios, Carlos A., Reyes-Suárez, José A., Bearzotti, Lorena A., Leiva, Víctor, Marchant, Carolina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8072774/ https://www.ncbi.nlm.nih.gov/pubmed/33923879 http://dx.doi.org/10.3390/e23040485

_version_	1783683982886436864
author	Palacios, Carlos A. Reyes-Suárez, José A. Bearzotti, Lorena A. Leiva, Víctor Marchant, Carolina
author_facet	Palacios, Carlos A. Reyes-Suárez, José A. Bearzotti, Lorena A. Leiva, Víctor Marchant, Carolina
author_sort	Palacios, Carlos A.
collection	PubMed
description	Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student’s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms.
format	Online Article Text
id	pubmed-8072774
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-80727742021-04-27 Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile Palacios, Carlos A. Reyes-Suárez, José A. Bearzotti, Lorena A. Leiva, Víctor Marchant, Carolina Entropy (Basel) Article Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student’s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms. MDPI 2021-04-20 /pmc/articles/PMC8072774/ /pubmed/33923879 http://dx.doi.org/10.3390/e23040485 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Palacios, Carlos A. Reyes-Suárez, José A. Bearzotti, Lorena A. Leiva, Víctor Marchant, Carolina Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
title	Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
title_full	Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
title_fullStr	Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
title_full_unstemmed	Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
title_short	Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile
title_sort	knowledge discovery for higher education student retention based on data mining: machine learning algorithms and case study in chile
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8072774/ https://www.ncbi.nlm.nih.gov/pubmed/33923879 http://dx.doi.org/10.3390/e23040485
work_keys_str_mv	AT palacioscarlosa knowledgediscoveryforhighereducationstudentretentionbasedondataminingmachinelearningalgorithmsandcasestudyinchile AT reyessuarezjosea knowledgediscoveryforhighereducationstudentretentionbasedondataminingmachinelearningalgorithmsandcasestudyinchile AT bearzottilorenaa knowledgediscoveryforhighereducationstudentretentionbasedondataminingmachinelearningalgorithmsandcasestudyinchile AT leivavictor knowledgediscoveryforhighereducationstudentretentionbasedondataminingmachinelearningalgorithmsandcasestudyinchile AT marchantcarolina knowledgediscoveryforhighereducationstudentretentionbasedondataminingmachinelearningalgorithmsandcasestudyinchile

Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile

Ejemplares similares