Cargando…

Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder

INTRODUCTION: Major depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising res...

Descripción completa

Detalles Bibliográficos
Autores principales: Carrle, Friedrich Philipp, Hollenbenders, Yasmin, Reichenbach, Alexandra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10577178/
https://www.ncbi.nlm.nih.gov/pubmed/37849893
http://dx.doi.org/10.3389/fnins.2023.1219133
_version_ 1785121268756905984
author Carrle, Friedrich Philipp
Hollenbenders, Yasmin
Reichenbach, Alexandra
author_facet Carrle, Friedrich Philipp
Hollenbenders, Yasmin
Reichenbach, Alexandra
author_sort Carrle, Friedrich Philipp
collection PubMed
description INTRODUCTION: Major depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising results. However, the generalizability of those models, a prerequisite for clinical application, is restricted by small datasets. One approach to train ML models with good generalizability is complementing the original with synthetic data produced by generative algorithms. Another advantage of synthetic data is the possibility of publishing the data for other researchers without risking patient data privacy. Synthetic EEG time-series have not yet been generated for two clinical populations like MDD patients and healthy controls. METHODS: We first reviewed 27 studies presenting EEG data augmentation with generative algorithms for classification tasks, like diagnosis, for the possibilities and shortcomings of recent methods. The subsequent empirical study generated EEG time-series based on two public datasets with 30/28 and 24/29 subjects (MDD/controls). To obtain baseline diagnostic accuracies, convolutional neural networks (CNN) were trained with time-series from each dataset. The data were synthesized with generative adversarial networks (GAN) consisting of CNNs. We evaluated the synthetic data qualitatively and quantitatively and finally used it for re-training the diagnostic model. RESULTS: The reviewed studies improved their classification accuracies by between 1 and 40% with the synthetic data. Our own diagnostic accuracy improved up to 10% for one dataset but not significantly for the other. We found a rich repertoire of generative models in the reviewed literature, solving various technical issues. A major shortcoming in the field is the lack of meaningful evaluation metrics for synthetic data. The few studies analyzing the data in the frequency domain, including our own, show that only some features can be produced truthfully. DISCUSSION: The systematic review combined with our own investigation provides an overview of the available methods for generating EEG data for a classification task, their possibilities, and shortcomings. The approach is promising and the technical basis is set. For a broad application of these techniques in neuroscience research or clinical application, the methods need fine-tuning facilitated by domain expertise in (clinical) EEG research.
format Online
Article
Text
id pubmed-10577178
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105771782023-10-17 Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder Carrle, Friedrich Philipp Hollenbenders, Yasmin Reichenbach, Alexandra Front Neurosci Neuroscience INTRODUCTION: Major depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising results. However, the generalizability of those models, a prerequisite for clinical application, is restricted by small datasets. One approach to train ML models with good generalizability is complementing the original with synthetic data produced by generative algorithms. Another advantage of synthetic data is the possibility of publishing the data for other researchers without risking patient data privacy. Synthetic EEG time-series have not yet been generated for two clinical populations like MDD patients and healthy controls. METHODS: We first reviewed 27 studies presenting EEG data augmentation with generative algorithms for classification tasks, like diagnosis, for the possibilities and shortcomings of recent methods. The subsequent empirical study generated EEG time-series based on two public datasets with 30/28 and 24/29 subjects (MDD/controls). To obtain baseline diagnostic accuracies, convolutional neural networks (CNN) were trained with time-series from each dataset. The data were synthesized with generative adversarial networks (GAN) consisting of CNNs. We evaluated the synthetic data qualitatively and quantitatively and finally used it for re-training the diagnostic model. RESULTS: The reviewed studies improved their classification accuracies by between 1 and 40% with the synthetic data. Our own diagnostic accuracy improved up to 10% for one dataset but not significantly for the other. We found a rich repertoire of generative models in the reviewed literature, solving various technical issues. A major shortcoming in the field is the lack of meaningful evaluation metrics for synthetic data. The few studies analyzing the data in the frequency domain, including our own, show that only some features can be produced truthfully. DISCUSSION: The systematic review combined with our own investigation provides an overview of the available methods for generating EEG data for a classification task, their possibilities, and shortcomings. The approach is promising and the technical basis is set. For a broad application of these techniques in neuroscience research or clinical application, the methods need fine-tuning facilitated by domain expertise in (clinical) EEG research. Frontiers Media S.A. 2023-10-02 /pmc/articles/PMC10577178/ /pubmed/37849893 http://dx.doi.org/10.3389/fnins.2023.1219133 Text en Copyright © 2023 Carrle, Hollenbenders and Reichenbach. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Carrle, Friedrich Philipp
Hollenbenders, Yasmin
Reichenbach, Alexandra
Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_full Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_fullStr Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_full_unstemmed Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_short Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_sort generation of synthetic eeg data for training algorithms supporting the diagnosis of major depressive disorder
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10577178/
https://www.ncbi.nlm.nih.gov/pubmed/37849893
http://dx.doi.org/10.3389/fnins.2023.1219133
work_keys_str_mv AT carrlefriedrichphilipp generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder
AT hollenbendersyasmin generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder
AT reichenbachalexandra generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder