Cargando…

Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy

Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Varotto, Giulia, Susi, Gianluca, Tassi, Laura, Gozzo, Francesca, Franceschetti, Silvana, Panzica, Ferruccio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8641296/ https://www.ncbi.nlm.nih.gov/pubmed/34867255 http://dx.doi.org/10.3389/fninf.2021.715421

_version_	1784609472305430528
author	Varotto, Giulia Susi, Gianluca Tassi, Laura Gozzo, Francesca Franceschetti, Silvana Panzica, Ferruccio
author_facet	Varotto, Giulia Susi, Gianluca Tassi, Laura Gozzo, Francesca Franceschetti, Silvana Panzica, Ferruccio
author_sort	Varotto, Giulia
collection	PubMed
description	Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery. Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered. Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method. Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome.
format	Online Article Text
id	pubmed-8641296
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86412962021-12-04 Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy Varotto, Giulia Susi, Gianluca Tassi, Laura Gozzo, Francesca Franceschetti, Silvana Panzica, Ferruccio Front Neuroinform Neuroscience Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery. Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered. Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method. Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome. Frontiers Media S.A. 2021-11-19 /pmc/articles/PMC8641296/ /pubmed/34867255 http://dx.doi.org/10.3389/fninf.2021.715421 Text en Copyright © 2021 Varotto, Susi, Tassi, Gozzo, Franceschetti and Panzica. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Varotto, Giulia Susi, Gianluca Tassi, Laura Gozzo, Francesca Franceschetti, Silvana Panzica, Ferruccio Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title	Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_full	Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_fullStr	Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_full_unstemmed	Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_short	Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
title_sort	comparison of resampling techniques for imbalanced datasets in machine learning: application to epileptogenic zone localization from interictal intracranial eeg recordings in patients with focal epilepsy
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8641296/ https://www.ncbi.nlm.nih.gov/pubmed/34867255 http://dx.doi.org/10.3389/fninf.2021.715421
work_keys_str_mv	AT varottogiulia comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT susigianluca comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT tassilaura comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT gozzofrancesca comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT franceschettisilvana comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT panzicaferruccio comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy

Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy

Ejemplares similares