Cargando…
A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121551/ https://www.ncbi.nlm.nih.gov/pubmed/37085533 http://dx.doi.org/10.1038/s41597-023-02125-y |
_version_ | 1785029395090505728 |
---|---|
author | Ding, Kexin Zhou, Mu Wang, He Gevaert, Olivier Metaxas, Dimitris Zhang, Shaoting |
author_facet | Ding, Kexin Zhou, Mu Wang, He Gevaert, Olivier Metaxas, Dimitris Zhang, Shaoting |
author_sort | Ding, Kexin |
collection | PubMed |
description | The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks. |
format | Online Article Text |
id | pubmed-10121551 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-101215512023-04-23 A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer Ding, Kexin Zhou, Mu Wang, He Gevaert, Olivier Metaxas, Dimitris Zhang, Shaoting Sci Data Data Descriptor The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks. Nature Publishing Group UK 2023-04-21 /pmc/articles/PMC10121551/ /pubmed/37085533 http://dx.doi.org/10.1038/s41597-023-02125-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Ding, Kexin Zhou, Mu Wang, He Gevaert, Olivier Metaxas, Dimitris Zhang, Shaoting A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer |
title | A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer |
title_full | A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer |
title_fullStr | A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer |
title_full_unstemmed | A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer |
title_short | A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer |
title_sort | large-scale synthetic pathological dataset for deep learning-enabled segmentation of breast cancer |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121551/ https://www.ncbi.nlm.nih.gov/pubmed/37085533 http://dx.doi.org/10.1038/s41597-023-02125-y |
work_keys_str_mv | AT dingkexin alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT zhoumu alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT wanghe alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT gevaertolivier alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT metaxasdimitris alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT zhangshaoting alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT dingkexin largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT zhoumu largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT wanghe largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT gevaertolivier largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT metaxasdimitris largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer AT zhangshaoting largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer |