Cargando…

Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types

The distribution and appearance of nuclei are essential markers for the diagnosis and study of cancer. Despite the importance of nuclear morphology, there is a lack of large scale, accurate, publicly accessible nucleus segmentation data. To address this, we developed an analysis pipeline that segmen...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Le, Gupta, Rajarsi, Van Arnam, John S., Zhang, Yuwei, Sivalenka, Kaustubh, Samaras, Dimitris, Kurc, Tahsin M., Saltz, Joel H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305328/
https://www.ncbi.nlm.nih.gov/pubmed/32561748
http://dx.doi.org/10.1038/s41597-020-0528-1
_version_ 1783548436684996608
author Hou, Le
Gupta, Rajarsi
Van Arnam, John S.
Zhang, Yuwei
Sivalenka, Kaustubh
Samaras, Dimitris
Kurc, Tahsin M.
Saltz, Joel H.
author_facet Hou, Le
Gupta, Rajarsi
Van Arnam, John S.
Zhang, Yuwei
Sivalenka, Kaustubh
Samaras, Dimitris
Kurc, Tahsin M.
Saltz, Joel H.
author_sort Hou, Le
collection PubMed
description The distribution and appearance of nuclei are essential markers for the diagnosis and study of cancer. Despite the importance of nuclear morphology, there is a lack of large scale, accurate, publicly accessible nucleus segmentation data. To address this, we developed an analysis pipeline that segments nuclei in whole slide tissue images from multiple cancer types with a quality control process. We have generated nucleus segmentation results in 5,060 Whole Slide Tissue images from 10 cancer types in The Cancer Genome Atlas. One key component of our work is that we carried out a multi-level quality control process (WSI-level and image patch-level), to evaluate the quality of our segmentation results. The image patch-level quality control used manual segmentation ground truth data from 1,356 sampled image patches. The datasets we publish in this work consist of roughly 5 billion quality controlled nuclei from more than 5,060 TCGA WSIs from 10 different TCGA cancer types and 1,356 manually segmented TCGA image patches from the same 10 cancer types plus additional 4 cancer types.
format Online
Article
Text
id pubmed-7305328
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73053282020-06-26 Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types Hou, Le Gupta, Rajarsi Van Arnam, John S. Zhang, Yuwei Sivalenka, Kaustubh Samaras, Dimitris Kurc, Tahsin M. Saltz, Joel H. Sci Data Data Descriptor The distribution and appearance of nuclei are essential markers for the diagnosis and study of cancer. Despite the importance of nuclear morphology, there is a lack of large scale, accurate, publicly accessible nucleus segmentation data. To address this, we developed an analysis pipeline that segments nuclei in whole slide tissue images from multiple cancer types with a quality control process. We have generated nucleus segmentation results in 5,060 Whole Slide Tissue images from 10 cancer types in The Cancer Genome Atlas. One key component of our work is that we carried out a multi-level quality control process (WSI-level and image patch-level), to evaluate the quality of our segmentation results. The image patch-level quality control used manual segmentation ground truth data from 1,356 sampled image patches. The datasets we publish in this work consist of roughly 5 billion quality controlled nuclei from more than 5,060 TCGA WSIs from 10 different TCGA cancer types and 1,356 manually segmented TCGA image patches from the same 10 cancer types plus additional 4 cancer types. Nature Publishing Group UK 2020-06-19 /pmc/articles/PMC7305328/ /pubmed/32561748 http://dx.doi.org/10.1038/s41597-020-0528-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Hou, Le
Gupta, Rajarsi
Van Arnam, John S.
Zhang, Yuwei
Sivalenka, Kaustubh
Samaras, Dimitris
Kurc, Tahsin M.
Saltz, Joel H.
Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
title Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
title_full Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
title_fullStr Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
title_full_unstemmed Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
title_short Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
title_sort dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305328/
https://www.ncbi.nlm.nih.gov/pubmed/32561748
http://dx.doi.org/10.1038/s41597-020-0528-1
work_keys_str_mv AT houle datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT guptarajarsi datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT vanarnamjohns datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT zhangyuwei datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT sivalenkakaustubh datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT samarasdimitris datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT kurctahsinm datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes
AT saltzjoelh datasetofsegmentednucleiinhematoxylinandeosinstainedhistopathologyimagesoftencancertypes