Cargando…

A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer

The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology,...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Kexin, Zhou, Mu, Wang, He, Gevaert, Olivier, Metaxas, Dimitris, Zhang, Shaoting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121551/
https://www.ncbi.nlm.nih.gov/pubmed/37085533
http://dx.doi.org/10.1038/s41597-023-02125-y
_version_ 1785029395090505728
author Ding, Kexin
Zhou, Mu
Wang, He
Gevaert, Olivier
Metaxas, Dimitris
Zhang, Shaoting
author_facet Ding, Kexin
Zhou, Mu
Wang, He
Gevaert, Olivier
Metaxas, Dimitris
Zhang, Shaoting
author_sort Ding, Kexin
collection PubMed
description The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks.
format Online
Article
Text
id pubmed-10121551
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-101215512023-04-23 A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer Ding, Kexin Zhou, Mu Wang, He Gevaert, Olivier Metaxas, Dimitris Zhang, Shaoting Sci Data Data Descriptor The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks. Nature Publishing Group UK 2023-04-21 /pmc/articles/PMC10121551/ /pubmed/37085533 http://dx.doi.org/10.1038/s41597-023-02125-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Ding, Kexin
Zhou, Mu
Wang, He
Gevaert, Olivier
Metaxas, Dimitris
Zhang, Shaoting
A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
title A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
title_full A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
title_fullStr A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
title_full_unstemmed A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
title_short A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
title_sort large-scale synthetic pathological dataset for deep learning-enabled segmentation of breast cancer
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121551/
https://www.ncbi.nlm.nih.gov/pubmed/37085533
http://dx.doi.org/10.1038/s41597-023-02125-y
work_keys_str_mv AT dingkexin alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT zhoumu alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT wanghe alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT gevaertolivier alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT metaxasdimitris alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT zhangshaoting alargescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT dingkexin largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT zhoumu largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT wanghe largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT gevaertolivier largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT metaxasdimitris largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer
AT zhangshaoting largescalesyntheticpathologicaldatasetfordeeplearningenabledsegmentationofbreastcancer