Cargando…

‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia

Big data is increasingly being promoted as a game changer for the future of science, as the volume of data has exploded in recent years. Big data characterized, among others, the data comes from multiple sources, multi-format, comply to 5-V’s in nature (value, volume, velocity, variety, and veracity...

Descripción completa

Detalles Bibliográficos
Autores principales: Yunita, Ariana, Santoso, Harry B., Hasibuan, Zainal A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9281197/
https://www.ncbi.nlm.nih.gov/pubmed/35855913
http://dx.doi.org/10.1186/s40537-022-00639-7
_version_ 1784746827399036928
author Yunita, Ariana
Santoso, Harry B.
Hasibuan, Zainal A.
author_facet Yunita, Ariana
Santoso, Harry B.
Hasibuan, Zainal A.
author_sort Yunita, Ariana
collection PubMed
description Big data is increasingly being promoted as a game changer for the future of science, as the volume of data has exploded in recent years. Big data characterized, among others, the data comes from multiple sources, multi-format, comply to 5-V’s in nature (value, volume, velocity, variety, and veracity). Big data also constitutes structured data, semi-structured data, and unstructured-data. These characteristics of big data formed “big data ecosystem” that have various active nodes involved. Regardless such complex characteristics of big data, the studies show that there exists inherent structure that can be very useful to provide meaningful solutions for various problems. One of the problems is anticipating proper action to students’ achievement. It is common practice that lecturer treat his/her class with “one-size-fits-all” policy and strategy. Whilst, the degree of students’ understanding, due to several factors, may not the same. Furthermore, it is often too late to take action to rescue the student’s achievement in trouble. This study attempted to gather all possible features involved from multiple data sources: national education databases, reports, webpages and so forth. The multiple data sources comprise data on undergraduate students from 13 provinces in Indonesia, including students’ academic histories, demographic profiles and socioeconomic backgrounds and institutional information (i.e. level of accreditation, programmes of study, type of university, geographical location). Gathered data is furthermore preprocessed using various techniques to overcome missing value, data categorisation, data consistency, data quality assurance, to produce relatively clean and sound big dataset. Principal component analysis (PCA) is employed in order to reduce dimensions of big dataset and furthermore use K-Means methods to reveal clusters (inherent structure) that may occur in that big dataset. There are 7 clusters suggested by K-Means analysis: 1. very low-risk students, 2. low-risk students, 3. moderate-risk students, 4. fluctuating-risk students, 5. high risk students, 6. very high-risk students and, 7. fail students. Among the clusters unreveal, (1) a gap between public universities and private universities across the three regions in Indonesia, (2) a gap between STEM and non-STEM programmes of study, (3) a gap between rural versus urban, (4) a gap of accreditation status, (5) a gap of quality human resources distribution, etc. Further study, we will use the characteristics of each cluster to predict students’ achievement based on students’ profiles, and provide solutions and interventions strategies for students to improve their likely success.
format Online
Article
Text
id pubmed-9281197
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-92811972022-07-14 ‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia Yunita, Ariana Santoso, Harry B. Hasibuan, Zainal A. J Big Data Methodology Big data is increasingly being promoted as a game changer for the future of science, as the volume of data has exploded in recent years. Big data characterized, among others, the data comes from multiple sources, multi-format, comply to 5-V’s in nature (value, volume, velocity, variety, and veracity). Big data also constitutes structured data, semi-structured data, and unstructured-data. These characteristics of big data formed “big data ecosystem” that have various active nodes involved. Regardless such complex characteristics of big data, the studies show that there exists inherent structure that can be very useful to provide meaningful solutions for various problems. One of the problems is anticipating proper action to students’ achievement. It is common practice that lecturer treat his/her class with “one-size-fits-all” policy and strategy. Whilst, the degree of students’ understanding, due to several factors, may not the same. Furthermore, it is often too late to take action to rescue the student’s achievement in trouble. This study attempted to gather all possible features involved from multiple data sources: national education databases, reports, webpages and so forth. The multiple data sources comprise data on undergraduate students from 13 provinces in Indonesia, including students’ academic histories, demographic profiles and socioeconomic backgrounds and institutional information (i.e. level of accreditation, programmes of study, type of university, geographical location). Gathered data is furthermore preprocessed using various techniques to overcome missing value, data categorisation, data consistency, data quality assurance, to produce relatively clean and sound big dataset. Principal component analysis (PCA) is employed in order to reduce dimensions of big dataset and furthermore use K-Means methods to reveal clusters (inherent structure) that may occur in that big dataset. There are 7 clusters suggested by K-Means analysis: 1. very low-risk students, 2. low-risk students, 3. moderate-risk students, 4. fluctuating-risk students, 5. high risk students, 6. very high-risk students and, 7. fail students. Among the clusters unreveal, (1) a gap between public universities and private universities across the three regions in Indonesia, (2) a gap between STEM and non-STEM programmes of study, (3) a gap between rural versus urban, (4) a gap of accreditation status, (5) a gap of quality human resources distribution, etc. Further study, we will use the characteristics of each cluster to predict students’ achievement based on students’ profiles, and provide solutions and interventions strategies for students to improve their likely success. Springer International Publishing 2022-07-14 2022 /pmc/articles/PMC9281197/ /pubmed/35855913 http://dx.doi.org/10.1186/s40537-022-00639-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methodology
Yunita, Ariana
Santoso, Harry B.
Hasibuan, Zainal A.
‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia
title ‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia
title_full ‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia
title_fullStr ‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia
title_full_unstemmed ‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia
title_short ‘Everything is data’: towards one big data ecosystem using multiple sources of data on higher education in Indonesia
title_sort ‘everything is data’: towards one big data ecosystem using multiple sources of data on higher education in indonesia
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9281197/
https://www.ncbi.nlm.nih.gov/pubmed/35855913
http://dx.doi.org/10.1186/s40537-022-00639-7
work_keys_str_mv AT yunitaariana everythingisdatatowardsonebigdataecosystemusingmultiplesourcesofdataonhighereducationinindonesia
AT santosoharryb everythingisdatatowardsonebigdataecosystemusingmultiplesourcesofdataonhighereducationinindonesia
AT hasibuanzainala everythingisdatatowardsonebigdataecosystemusingmultiplesourcesofdataonhighereducationinindonesia