Cargando…

A curated mammography data set for use in computer-aided detection and diagnosis research

Published research results are difficult to replicate due to the lack of a standard evaluation data set in the area of decision support systems in mammography; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Rebecca Sawyer, Gimenez, Francisco, Hoogi, Assaf, Miyake, Kanae Kawai, Gorovoy, Mia, Rubin, Daniel L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735920/
https://www.ncbi.nlm.nih.gov/pubmed/29257132
http://dx.doi.org/10.1038/sdata.2017.177
_version_ 1783287293886332928
author Lee, Rebecca Sawyer
Gimenez, Francisco
Hoogi, Assaf
Miyake, Kanae Kawai
Gorovoy, Mia
Rubin, Daniel L.
author_facet Lee, Rebecca Sawyer
Gimenez, Francisco
Hoogi, Assaf
Miyake, Kanae Kawai
Gorovoy, Mia
Rubin, Daniel L.
author_sort Lee, Rebecca Sawyer
collection PubMed
description Published research results are difficult to replicate due to the lack of a standard evaluation data set in the area of decision support systems in mammography; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. This causes an inability to directly compare the performance of methods or to replicate prior results. We seek to resolve this substantial challenge by releasing an updated and standardized version of the Digital Database for Screening Mammography (DDSM) for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography. Our data set, the CBIS-DDSM (Curated Breast Imaging Subset of DDSM), includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data-set size capable of analyzing decision support systems in mammography.
format Online
Article
Text
id pubmed-5735920
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-57359202017-12-21 A curated mammography data set for use in computer-aided detection and diagnosis research Lee, Rebecca Sawyer Gimenez, Francisco Hoogi, Assaf Miyake, Kanae Kawai Gorovoy, Mia Rubin, Daniel L. Sci Data Data Descriptor Published research results are difficult to replicate due to the lack of a standard evaluation data set in the area of decision support systems in mammography; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. This causes an inability to directly compare the performance of methods or to replicate prior results. We seek to resolve this substantial challenge by releasing an updated and standardized version of the Digital Database for Screening Mammography (DDSM) for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography. Our data set, the CBIS-DDSM (Curated Breast Imaging Subset of DDSM), includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data-set size capable of analyzing decision support systems in mammography. Nature Publishing Group 2017-12-19 /pmc/articles/PMC5735920/ /pubmed/29257132 http://dx.doi.org/10.1038/sdata.2017.177 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.
spellingShingle Data Descriptor
Lee, Rebecca Sawyer
Gimenez, Francisco
Hoogi, Assaf
Miyake, Kanae Kawai
Gorovoy, Mia
Rubin, Daniel L.
A curated mammography data set for use in computer-aided detection and diagnosis research
title A curated mammography data set for use in computer-aided detection and diagnosis research
title_full A curated mammography data set for use in computer-aided detection and diagnosis research
title_fullStr A curated mammography data set for use in computer-aided detection and diagnosis research
title_full_unstemmed A curated mammography data set for use in computer-aided detection and diagnosis research
title_short A curated mammography data set for use in computer-aided detection and diagnosis research
title_sort curated mammography data set for use in computer-aided detection and diagnosis research
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5735920/
https://www.ncbi.nlm.nih.gov/pubmed/29257132
http://dx.doi.org/10.1038/sdata.2017.177
work_keys_str_mv AT leerebeccasawyer acuratedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT gimenezfrancisco acuratedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT hoogiassaf acuratedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT miyakekanaekawai acuratedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT gorovoymia acuratedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT rubindaniell acuratedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT leerebeccasawyer curatedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT gimenezfrancisco curatedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT hoogiassaf curatedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT miyakekanaekawai curatedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT gorovoymia curatedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch
AT rubindaniell curatedmammographydatasetforuseincomputeraideddetectionanddiagnosisresearch