Cargando…
SinGAN-Seg: Synthetic training data generation for medical image segmentation
Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Therefore, artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060378/ https://www.ncbi.nlm.nih.gov/pubmed/35500005 http://dx.doi.org/10.1371/journal.pone.0267976 |
_version_ | 1784698492949626880 |
---|---|
author | Thambawita, Vajira Salehi, Pegah Sheshkal, Sajad Amouei Hicks, Steven A. Hammer, Hugo L. Parasa, Sravanthi de Lange, Thomas Halvorsen, Pål Riegler, Michael A. |
author_facet | Thambawita, Vajira Salehi, Pegah Sheshkal, Sajad Amouei Hicks, Steven A. Hammer, Hugo L. Parasa, Sravanthi de Lange, Thomas Halvorsen, Pål Riegler, Michael A. |
author_sort | Thambawita, Vajira |
collection | PubMed |
description | Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Therefore, artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. However, the machine learning models used to build these tools are highly dependent on the data used to train them. Large amounts of data can be difficult to obtain in medicine due to privacy reasons, expensive and time-consuming annotations, and a general lack of data samples for infrequent lesions. In this study, we present a novel synthetic data generation pipeline, called SinGAN-Seg, to produce synthetic medical images with corresponding masks using a single training image. Our method is different from the traditional generative adversarial networks (GANs) because our model needs only a single image and the corresponding ground truth to train. We also show that the synthetic data generation pipeline can be used to produce alternative artificial segmentation datasets with corresponding ground truth masks when real datasets are not allowed to share. The pipeline is evaluated using qualitative and quantitative comparisons between real data and synthetic data to show that the style transfer technique used in our pipeline significantly improves the quality of the generated data and our method is better than other state-of-the-art GANs to prepare synthetic images when the size of training datasets are limited. By training UNet++ using both real data and the synthetic data generated from the SinGAN-Seg pipeline, we show that the models trained on synthetic data have very close performances to those trained on real data when both datasets have a considerable amount of training data. In contrast, we show that synthetic data generated from the SinGAN-Seg pipeline improves the performance of segmentation models when training datasets do not have a considerable amount of data. All experiments were performed using an open dataset and the code is publicly available on GitHub. |
format | Online Article Text |
id | pubmed-9060378 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-90603782022-05-03 SinGAN-Seg: Synthetic training data generation for medical image segmentation Thambawita, Vajira Salehi, Pegah Sheshkal, Sajad Amouei Hicks, Steven A. Hammer, Hugo L. Parasa, Sravanthi de Lange, Thomas Halvorsen, Pål Riegler, Michael A. PLoS One Research Article Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Therefore, artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. However, the machine learning models used to build these tools are highly dependent on the data used to train them. Large amounts of data can be difficult to obtain in medicine due to privacy reasons, expensive and time-consuming annotations, and a general lack of data samples for infrequent lesions. In this study, we present a novel synthetic data generation pipeline, called SinGAN-Seg, to produce synthetic medical images with corresponding masks using a single training image. Our method is different from the traditional generative adversarial networks (GANs) because our model needs only a single image and the corresponding ground truth to train. We also show that the synthetic data generation pipeline can be used to produce alternative artificial segmentation datasets with corresponding ground truth masks when real datasets are not allowed to share. The pipeline is evaluated using qualitative and quantitative comparisons between real data and synthetic data to show that the style transfer technique used in our pipeline significantly improves the quality of the generated data and our method is better than other state-of-the-art GANs to prepare synthetic images when the size of training datasets are limited. By training UNet++ using both real data and the synthetic data generated from the SinGAN-Seg pipeline, we show that the models trained on synthetic data have very close performances to those trained on real data when both datasets have a considerable amount of training data. In contrast, we show that synthetic data generated from the SinGAN-Seg pipeline improves the performance of segmentation models when training datasets do not have a considerable amount of data. All experiments were performed using an open dataset and the code is publicly available on GitHub. Public Library of Science 2022-05-02 /pmc/articles/PMC9060378/ /pubmed/35500005 http://dx.doi.org/10.1371/journal.pone.0267976 Text en © 2022 Thambawita et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Thambawita, Vajira Salehi, Pegah Sheshkal, Sajad Amouei Hicks, Steven A. Hammer, Hugo L. Parasa, Sravanthi de Lange, Thomas Halvorsen, Pål Riegler, Michael A. SinGAN-Seg: Synthetic training data generation for medical image segmentation |
title | SinGAN-Seg: Synthetic training data generation for medical image segmentation |
title_full | SinGAN-Seg: Synthetic training data generation for medical image segmentation |
title_fullStr | SinGAN-Seg: Synthetic training data generation for medical image segmentation |
title_full_unstemmed | SinGAN-Seg: Synthetic training data generation for medical image segmentation |
title_short | SinGAN-Seg: Synthetic training data generation for medical image segmentation |
title_sort | singan-seg: synthetic training data generation for medical image segmentation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060378/ https://www.ncbi.nlm.nih.gov/pubmed/35500005 http://dx.doi.org/10.1371/journal.pone.0267976 |
work_keys_str_mv | AT thambawitavajira singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT salehipegah singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT sheshkalsajadamouei singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT hicksstevena singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT hammerhugol singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT parasasravanthi singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT delangethomas singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT halvorsenpal singansegsynthetictrainingdatagenerationformedicalimagesegmentation AT rieglermichaela singansegsynthetictrainingdatagenerationformedicalimagesegmentation |