Cargando…

Task-specific information outperforms surveillance-style big data in predictive analytics

Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Bjerre-Nielsen, Andreas, Kassarnig, Valentin, Lassen, David Dreyer, Lehmann, Sune
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040817/
https://www.ncbi.nlm.nih.gov/pubmed/33790010
http://dx.doi.org/10.1073/pnas.2020258118
_version_ 1783677849429868544
author Bjerre-Nielsen, Andreas
Kassarnig, Valentin
Lassen, David Dreyer
Lehmann, Sune
author_facet Bjerre-Nielsen, Andreas
Kassarnig, Valentin
Lassen, David Dreyer
Lehmann, Sune
author_sort Bjerre-Nielsen, Andreas
collection PubMed
description Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19–induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students’ privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with “ground truth” administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.
format Online
Article
Text
id pubmed-8040817
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-80408172021-04-20 Task-specific information outperforms surveillance-style big data in predictive analytics Bjerre-Nielsen, Andreas Kassarnig, Valentin Lassen, David Dreyer Lehmann, Sune Proc Natl Acad Sci U S A Physical Sciences Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19–induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students’ privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with “ground truth” administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting. National Academy of Sciences 2021-04-06 2021-03-31 /pmc/articles/PMC8040817/ /pubmed/33790010 http://dx.doi.org/10.1073/pnas.2020258118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by/4.0/This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Physical Sciences
Bjerre-Nielsen, Andreas
Kassarnig, Valentin
Lassen, David Dreyer
Lehmann, Sune
Task-specific information outperforms surveillance-style big data in predictive analytics
title Task-specific information outperforms surveillance-style big data in predictive analytics
title_full Task-specific information outperforms surveillance-style big data in predictive analytics
title_fullStr Task-specific information outperforms surveillance-style big data in predictive analytics
title_full_unstemmed Task-specific information outperforms surveillance-style big data in predictive analytics
title_short Task-specific information outperforms surveillance-style big data in predictive analytics
title_sort task-specific information outperforms surveillance-style big data in predictive analytics
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040817/
https://www.ncbi.nlm.nih.gov/pubmed/33790010
http://dx.doi.org/10.1073/pnas.2020258118
work_keys_str_mv AT bjerrenielsenandreas taskspecificinformationoutperformssurveillancestylebigdatainpredictiveanalytics
AT kassarnigvalentin taskspecificinformationoutperformssurveillancestylebigdatainpredictiveanalytics
AT lassendaviddreyer taskspecificinformationoutperformssurveillancestylebigdatainpredictiveanalytics
AT lehmannsune taskspecificinformationoutperformssurveillancestylebigdatainpredictiveanalytics