Cargando…

A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)

For AI researchers, access to a large and well-curated dataset is crucial. Working in the field of breast radiology, our aim was to develop a high-quality platform that can be used for evaluation of networks aiming to predict breast cancer risk, estimate mammographic sensitivity, and detect tumors....

Descripción completa

Detalles Bibliográficos
Autores principales: Dembrower, Karin, Lindholm, Peter, Strand, Fredrik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7165146/
https://www.ncbi.nlm.nih.gov/pubmed/31520277
http://dx.doi.org/10.1007/s10278-019-00278-0
_version_ 1783523418725941248
author Dembrower, Karin
Lindholm, Peter
Strand, Fredrik
author_facet Dembrower, Karin
Lindholm, Peter
Strand, Fredrik
author_sort Dembrower, Karin
collection PubMed
description For AI researchers, access to a large and well-curated dataset is crucial. Working in the field of breast radiology, our aim was to develop a high-quality platform that can be used for evaluation of networks aiming to predict breast cancer risk, estimate mammographic sensitivity, and detect tumors. Our dataset, Cohort of Screen-Aged Women (CSAW), is a population-based cohort of all women 40 to 74 years of age invited to screening in the Stockholm region, Sweden, between 2008 and 2015. All women were invited to mammography screening every 18 to 24 months free of charge. Images were collected from the PACS of the three breast centers that completely cover the region. DICOM metadata were collected together with the images. Screening decisions and clinical outcome data were collected by linkage to the regional cancer center registers. Incident cancer cases, from one center, were pixel-level annotated by a radiologist. A separate subset for efficient evaluation of external networks was defined for the uptake area of one center. The collection and use of the dataset for the purpose of AI research has been approved by the Ethical Review Board. CSAW included 499,807 women invited to screening between 2008 and 2015 with a total of 1,182,733 completed screening examinations. Around 2 million mammography images have currently been collected, including all images for women who developed breast cancer. There were 10,582 women diagnosed with breast cancer; for 8463, it was their first breast cancer. Clinical data include biopsy-verified breast cancer diagnoses, histological origin, tumor size, lymph node status, Elston grade, and receptor status. One thousand eight hundred ninety-one images of 898 women had tumors pixel level annotated including any tumor signs in the prior negative screening mammogram. Our dataset has already been used for evaluation by several research groups. We have defined a high-volume platform for training and evaluation of deep neural networks in the domain of mammographic imaging. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s10278-019-00278-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7165146
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-71651462020-04-24 A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW) Dembrower, Karin Lindholm, Peter Strand, Fredrik J Digit Imaging Article For AI researchers, access to a large and well-curated dataset is crucial. Working in the field of breast radiology, our aim was to develop a high-quality platform that can be used for evaluation of networks aiming to predict breast cancer risk, estimate mammographic sensitivity, and detect tumors. Our dataset, Cohort of Screen-Aged Women (CSAW), is a population-based cohort of all women 40 to 74 years of age invited to screening in the Stockholm region, Sweden, between 2008 and 2015. All women were invited to mammography screening every 18 to 24 months free of charge. Images were collected from the PACS of the three breast centers that completely cover the region. DICOM metadata were collected together with the images. Screening decisions and clinical outcome data were collected by linkage to the regional cancer center registers. Incident cancer cases, from one center, were pixel-level annotated by a radiologist. A separate subset for efficient evaluation of external networks was defined for the uptake area of one center. The collection and use of the dataset for the purpose of AI research has been approved by the Ethical Review Board. CSAW included 499,807 women invited to screening between 2008 and 2015 with a total of 1,182,733 completed screening examinations. Around 2 million mammography images have currently been collected, including all images for women who developed breast cancer. There were 10,582 women diagnosed with breast cancer; for 8463, it was their first breast cancer. Clinical data include biopsy-verified breast cancer diagnoses, histological origin, tumor size, lymph node status, Elston grade, and receptor status. One thousand eight hundred ninety-one images of 898 women had tumors pixel level annotated including any tumor signs in the prior negative screening mammogram. Our dataset has already been used for evaluation by several research groups. We have defined a high-volume platform for training and evaluation of deep neural networks in the domain of mammographic imaging. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s10278-019-00278-0) contains supplementary material, which is available to authorized users. Springer International Publishing 2019-09-13 2020-04 /pmc/articles/PMC7165146/ /pubmed/31520277 http://dx.doi.org/10.1007/s10278-019-00278-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Article
Dembrower, Karin
Lindholm, Peter
Strand, Fredrik
A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)
title A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)
title_full A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)
title_fullStr A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)
title_full_unstemmed A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)
title_short A Multi-million Mammography Image Dataset and Population-Based Screening Cohort for the Training and Evaluation of Deep Neural Networks—the Cohort of Screen-Aged Women (CSAW)
title_sort multi-million mammography image dataset and population-based screening cohort for the training and evaluation of deep neural networks—the cohort of screen-aged women (csaw)
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7165146/
https://www.ncbi.nlm.nih.gov/pubmed/31520277
http://dx.doi.org/10.1007/s10278-019-00278-0
work_keys_str_mv AT dembrowerkarin amultimillionmammographyimagedatasetandpopulationbasedscreeningcohortforthetrainingandevaluationofdeepneuralnetworksthecohortofscreenagedwomencsaw
AT lindholmpeter amultimillionmammographyimagedatasetandpopulationbasedscreeningcohortforthetrainingandevaluationofdeepneuralnetworksthecohortofscreenagedwomencsaw
AT strandfredrik amultimillionmammographyimagedatasetandpopulationbasedscreeningcohortforthetrainingandevaluationofdeepneuralnetworksthecohortofscreenagedwomencsaw
AT dembrowerkarin multimillionmammographyimagedatasetandpopulationbasedscreeningcohortforthetrainingandevaluationofdeepneuralnetworksthecohortofscreenagedwomencsaw
AT lindholmpeter multimillionmammographyimagedatasetandpopulationbasedscreeningcohortforthetrainingandevaluationofdeepneuralnetworksthecohortofscreenagedwomencsaw
AT strandfredrik multimillionmammographyimagedatasetandpopulationbasedscreeningcohortforthetrainingandevaluationofdeepneuralnetworksthecohortofscreenagedwomencsaw