Cargando…

Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research

PURPOSE: The purpose of this study is to construct a synthetic dataset of ECG signal that overcomes the sensitivity of personal information and the complexity of disclosure policies. METHODS: The public dataset was constructed by generating synthetic data based on the deep learning model using a con...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoo, Hakje, Moon, Jose, Kim, Jong-Ho, Joo, Hyung Joon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468461/
https://www.ncbi.nlm.nih.gov/pubmed/37662618
http://dx.doi.org/10.1007/s13755-023-00241-y
_version_ 1785099242085285888
author Yoo, Hakje
Moon, Jose
Kim, Jong-Ho
Joo, Hyung Joon
author_facet Yoo, Hakje
Moon, Jose
Kim, Jong-Ho
Joo, Hyung Joon
author_sort Yoo, Hakje
collection PubMed
description PURPOSE: The purpose of this study is to construct a synthetic dataset of ECG signal that overcomes the sensitivity of personal information and the complexity of disclosure policies. METHODS: The public dataset was constructed by generating synthetic data based on the deep learning model using a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM), and the effectiveness of the dataset was verified by developing classification models for ECG diagnoses. RESULTS: The synthetic 12-lead ECG dataset generated consists of a total of 6000 ECGs, with normal and 5 abnormal groups. The synthetic ECG signal has a waveform pattern similar to the original ECG signal, the average RMSE between the two signals is 0.042 µV, and the average cosine similarity is 0.993. In addition, five classification models were developed to verify the effect of the synthetic dataset and showed performance similar to that of the model made with the actual dataset. In particular, even when the real dataset was applied as a test set to the classification model trained with the synthetic dataset, the classification performance of all models showed high accuracy (average accuracy 93.41%). CONCLUSION: The synthetic 12-lead ECG dataset was confirmed to perform similarly to the real-world 12-lead ECG in the classification model. This implies that a synthetic dataset can perform similarly to a real dataset in clinical research using AI. The synthetic dataset generation process in this study provides a way to overcome the medical data disclosure challenges constrained by privacy rights, a way to encourage open data policies, and contribute significantly to promoting cardiovascular disease research.
format Online
Article
Text
id pubmed-10468461
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-104684612023-09-01 Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research Yoo, Hakje Moon, Jose Kim, Jong-Ho Joo, Hyung Joon Health Inf Sci Syst Research PURPOSE: The purpose of this study is to construct a synthetic dataset of ECG signal that overcomes the sensitivity of personal information and the complexity of disclosure policies. METHODS: The public dataset was constructed by generating synthetic data based on the deep learning model using a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM), and the effectiveness of the dataset was verified by developing classification models for ECG diagnoses. RESULTS: The synthetic 12-lead ECG dataset generated consists of a total of 6000 ECGs, with normal and 5 abnormal groups. The synthetic ECG signal has a waveform pattern similar to the original ECG signal, the average RMSE between the two signals is 0.042 µV, and the average cosine similarity is 0.993. In addition, five classification models were developed to verify the effect of the synthetic dataset and showed performance similar to that of the model made with the actual dataset. In particular, even when the real dataset was applied as a test set to the classification model trained with the synthetic dataset, the classification performance of all models showed high accuracy (average accuracy 93.41%). CONCLUSION: The synthetic 12-lead ECG dataset was confirmed to perform similarly to the real-world 12-lead ECG in the classification model. This implies that a synthetic dataset can perform similarly to a real dataset in clinical research using AI. The synthetic dataset generation process in this study provides a way to overcome the medical data disclosure challenges constrained by privacy rights, a way to encourage open data policies, and contribute significantly to promoting cardiovascular disease research. Springer International Publishing 2023-08-30 /pmc/articles/PMC10468461/ /pubmed/37662618 http://dx.doi.org/10.1007/s13755-023-00241-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research
Yoo, Hakje
Moon, Jose
Kim, Jong-Ho
Joo, Hyung Joon
Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
title Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
title_full Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
title_fullStr Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
title_full_unstemmed Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
title_short Design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
title_sort design and technical validation to generate a synthetic 12-lead electrocardiogram dataset to promote artificial intelligence research
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468461/
https://www.ncbi.nlm.nih.gov/pubmed/37662618
http://dx.doi.org/10.1007/s13755-023-00241-y
work_keys_str_mv AT yoohakje designandtechnicalvalidationtogenerateasynthetic12leadelectrocardiogramdatasettopromoteartificialintelligenceresearch
AT moonjose designandtechnicalvalidationtogenerateasynthetic12leadelectrocardiogramdatasettopromoteartificialintelligenceresearch
AT kimjongho designandtechnicalvalidationtogenerateasynthetic12leadelectrocardiogramdatasettopromoteartificialintelligenceresearch
AT joohyungjoon designandtechnicalvalidationtogenerateasynthetic12leadelectrocardiogramdatasettopromoteartificialintelligenceresearch