Cargando…

The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models

EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kamrud, Alexander, Borghetti, Brett, Schubert Kabban, Christine
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8125354/ https://www.ncbi.nlm.nih.gov/pubmed/34066595 http://dx.doi.org/10.3390/s21093225

_version_	1783693476632723456
author	Kamrud, Alexander Borghetti, Brett Schubert Kabban, Christine
author_facet	Kamrud, Alexander Borghetti, Brett Schubert Kabban, Christine
author_sort	Kamrud, Alexander
collection	PubMed
description	EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning data into training, validation, and testing sets, in order for cross-participant models to avoid overestimation of model accuracy. Despite this necessity, the majority of EEG-based cross-participant models have not adopted such guidelines. Furthermore, some data repositories may unwittingly contribute to the problem by providing partitioned test and non-test datasets for reasons such as competition support. In this study, we demonstrate how improper dataset partitioning and the resulting improper training, validation, and testing of a cross-participant model leads to overestimated model accuracy. We demonstrate this mathematically, and empirically, using five publicly available datasets. To build the cross-participant models for these datasets, we replicate published results and demonstrate how the model accuracies are significantly reduced when proper EEG cross-participant model guidelines are followed. Our empirical results show that by not following these guidelines, error rates of cross-participant models can be underestimated between 35% and 3900%. This misrepresentation of model performance for the general population potentially slows scientific progress toward truly high-performing classification models.
format	Online Article Text
id	pubmed-8125354
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-81253542021-05-17 The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models Kamrud, Alexander Borghetti, Brett Schubert Kabban, Christine Sensors (Basel) Article EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning data into training, validation, and testing sets, in order for cross-participant models to avoid overestimation of model accuracy. Despite this necessity, the majority of EEG-based cross-participant models have not adopted such guidelines. Furthermore, some data repositories may unwittingly contribute to the problem by providing partitioned test and non-test datasets for reasons such as competition support. In this study, we demonstrate how improper dataset partitioning and the resulting improper training, validation, and testing of a cross-participant model leads to overestimated model accuracy. We demonstrate this mathematically, and empirically, using five publicly available datasets. To build the cross-participant models for these datasets, we replicate published results and demonstrate how the model accuracies are significantly reduced when proper EEG cross-participant model guidelines are followed. Our empirical results show that by not following these guidelines, error rates of cross-participant models can be underestimated between 35% and 3900%. This misrepresentation of model performance for the general population potentially slows scientific progress toward truly high-performing classification models. MDPI 2021-05-06 /pmc/articles/PMC8125354/ /pubmed/34066595 http://dx.doi.org/10.3390/s21093225 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kamrud, Alexander Borghetti, Brett Schubert Kabban, Christine The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
title	The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
title_full	The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
title_fullStr	The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
title_full_unstemmed	The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
title_short	The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models
title_sort	effects of individual differences, non-stationarity, and the importance of data partitioning decisions for training and testing of eeg cross-participant models
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8125354/ https://www.ncbi.nlm.nih.gov/pubmed/34066595 http://dx.doi.org/10.3390/s21093225
work_keys_str_mv	AT kamrudalexander theeffectsofindividualdifferencesnonstationarityandtheimportanceofdatapartitioningdecisionsfortrainingandtestingofeegcrossparticipantmodels AT borghettibrett theeffectsofindividualdifferencesnonstationarityandtheimportanceofdatapartitioningdecisionsfortrainingandtestingofeegcrossparticipantmodels AT schubertkabbanchristine theeffectsofindividualdifferencesnonstationarityandtheimportanceofdatapartitioningdecisionsfortrainingandtestingofeegcrossparticipantmodels AT kamrudalexander effectsofindividualdifferencesnonstationarityandtheimportanceofdatapartitioningdecisionsfortrainingandtestingofeegcrossparticipantmodels AT borghettibrett effectsofindividualdifferencesnonstationarityandtheimportanceofdatapartitioningdecisionsfortrainingandtestingofeegcrossparticipantmodels AT schubertkabbanchristine effectsofindividualdifferencesnonstationarityandtheimportanceofdatapartitioningdecisionsfortrainingandtestingofeegcrossparticipantmodels

The Effects of Individual Differences, Non-Stationarity, and the Importance of Data Partitioning Decisions for Training and Testing of EEG Cross-Participant Models

Ejemplares similares