Cargando…

An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding

Automated clinical EEG analysis using machine learning (ML) methods is a growing EEG research area. Previous studies on binary EEG pathology decoding have mainly used the Temple University Hospital (TUH) Abnormal EEG Corpus (TUAB) which contains approximately 3,000 manually labelled EEG recordings....

Descripción completa

Detalles Bibliográficos
Autores principales:	Kiessner, Ann-Kathrin, Schirrmeister, Robin T., Gemein, Lukas A.W., Boedecker, Joschka, Ball, Tonio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Regular Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432245/ https://www.ncbi.nlm.nih.gov/pubmed/37544168 http://dx.doi.org/10.1016/j.nicl.2023.103482

_version_	1785091357653598208
author	Kiessner, Ann-Kathrin Schirrmeister, Robin T. Gemein, Lukas A.W. Boedecker, Joschka Ball, Tonio
author_facet	Kiessner, Ann-Kathrin Schirrmeister, Robin T. Gemein, Lukas A.W. Boedecker, Joschka Ball, Tonio
author_sort	Kiessner, Ann-Kathrin
collection	PubMed
description	Automated clinical EEG analysis using machine learning (ML) methods is a growing EEG research area. Previous studies on binary EEG pathology decoding have mainly used the Temple University Hospital (TUH) Abnormal EEG Corpus (TUAB) which contains approximately 3,000 manually labelled EEG recordings. To evaluate and eventually even improve the generalisation performance of machine learning methods for EEG pathology, decoding larger, publicly available datasets is required. A number of studies addressed the automatic labelling of large open-source datasets as an approach to create new datasets for EEG pathology decoding, but little is known about the extent to which training on larger, automatically labelled dataset affects decoding performances of established deep neural networks. In this study, we automatically created additional pathology labels for the Temple University Hospital (TUH) EEG Corpus (TUEG) based on the medical reports using a rule-based text classifier. We generated a dataset of 15,300 newly labelled recordings, which we call the TUH Abnormal Expansion EEG Corpus (TUABEX), and which is five times larger than the TUAB. Since the TUABEX contains more pathological (75%) than non-pathological (25%) recordings, we then selected a balanced subset of 8,879 recordings, the TUH Abnormal Expansion Balanced EEG Corpus (TUABEXB). To investigate how training on a larger, automatically labelled dataset affects the decoding performance of deep neural networks, we applied four established deep convolutional neural networks (ConvNets) to the task of pathological versus non-pathological classification and compared the performance of each architecture after training on different datasets. The results show that training on the automatically labelled TUABEXB dataset rather than training on the manually labelled TUAB dataset increases accuracies on TUABEXB and even for TUAB itself for some architectures. We argue that automatically labelling of large open-source datasets can be used to efficiently utilise the massive amount of EEG data stored in clinical archives. We make the proposed TUABEXB available open source and thus offer a new dataset for EEG machine learning research.
format	Online Article Text
id	pubmed-10432245
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-104322452023-08-18 An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding Kiessner, Ann-Kathrin Schirrmeister, Robin T. Gemein, Lukas A.W. Boedecker, Joschka Ball, Tonio Neuroimage Clin Regular Article Automated clinical EEG analysis using machine learning (ML) methods is a growing EEG research area. Previous studies on binary EEG pathology decoding have mainly used the Temple University Hospital (TUH) Abnormal EEG Corpus (TUAB) which contains approximately 3,000 manually labelled EEG recordings. To evaluate and eventually even improve the generalisation performance of machine learning methods for EEG pathology, decoding larger, publicly available datasets is required. A number of studies addressed the automatic labelling of large open-source datasets as an approach to create new datasets for EEG pathology decoding, but little is known about the extent to which training on larger, automatically labelled dataset affects decoding performances of established deep neural networks. In this study, we automatically created additional pathology labels for the Temple University Hospital (TUH) EEG Corpus (TUEG) based on the medical reports using a rule-based text classifier. We generated a dataset of 15,300 newly labelled recordings, which we call the TUH Abnormal Expansion EEG Corpus (TUABEX), and which is five times larger than the TUAB. Since the TUABEX contains more pathological (75%) than non-pathological (25%) recordings, we then selected a balanced subset of 8,879 recordings, the TUH Abnormal Expansion Balanced EEG Corpus (TUABEXB). To investigate how training on a larger, automatically labelled dataset affects the decoding performance of deep neural networks, we applied four established deep convolutional neural networks (ConvNets) to the task of pathological versus non-pathological classification and compared the performance of each architecture after training on different datasets. The results show that training on the automatically labelled TUABEXB dataset rather than training on the manually labelled TUAB dataset increases accuracies on TUABEXB and even for TUAB itself for some architectures. We argue that automatically labelling of large open-source datasets can be used to efficiently utilise the massive amount of EEG data stored in clinical archives. We make the proposed TUABEXB available open source and thus offer a new dataset for EEG machine learning research. Elsevier 2023-07-28 /pmc/articles/PMC10432245/ /pubmed/37544168 http://dx.doi.org/10.1016/j.nicl.2023.103482 Text en © 2023 Published by Elsevier Inc. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Regular Article Kiessner, Ann-Kathrin Schirrmeister, Robin T. Gemein, Lukas A.W. Boedecker, Joschka Ball, Tonio An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding
title	An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding
title_full	An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding
title_fullStr	An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding
title_full_unstemmed	An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding
title_short	An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding
title_sort	extended clinical eeg dataset with 15,300 automatically labelled recordings for pathology decoding
topic	Regular Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432245/ https://www.ncbi.nlm.nih.gov/pubmed/37544168 http://dx.doi.org/10.1016/j.nicl.2023.103482
work_keys_str_mv	AT kiessnerannkathrin anextendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT schirrmeisterrobint anextendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT gemeinlukasaw anextendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT boedeckerjoschka anextendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT balltonio anextendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT kiessnerannkathrin extendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT schirrmeisterrobint extendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT gemeinlukasaw extendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT boedeckerjoschka extendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding AT balltonio extendedclinicaleegdatasetwith15300automaticallylabelledrecordingsforpathologydecoding

An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding

Ejemplares similares