Cargando…

A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future

The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be...

Descripción completa

Detalles Bibliográficos
Autores principales:	Logan, Joe, Kennedy, Paul J., Catchpoole, Daniel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Analysis
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491669/ https://www.ncbi.nlm.nih.gov/pubmed/37684306 http://dx.doi.org/10.1038/s41597-023-02430-6

_version_	1785104108565299200
author	Logan, Joe Kennedy, Paul J. Catchpoole, Daniel
author_facet	Logan, Joe Kennedy, Paul J. Catchpoole, Daniel
author_sort	Logan, Joe
collection	PubMed
description	The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.
format	Online Article Text
id	pubmed-10491669
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-104916692023-09-10 A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future Logan, Joe Kennedy, Paul J. Catchpoole, Daniel Sci Data Analysis The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer. Nature Publishing Group UK 2023-09-08 /pmc/articles/PMC10491669/ /pubmed/37684306 http://dx.doi.org/10.1038/s41597-023-02430-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Analysis Logan, Joe Kennedy, Paul J. Catchpoole, Daniel A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
title	A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
title_full	A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
title_fullStr	A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
title_full_unstemmed	A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
title_short	A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future
title_sort	review of the machine learning datasets in mammography, their adherence to the fair principles and the outlook for the future
topic	Analysis
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491669/ https://www.ncbi.nlm.nih.gov/pubmed/37684306 http://dx.doi.org/10.1038/s41597-023-02430-6
work_keys_str_mv	AT loganjoe areviewofthemachinelearningdatasetsinmammographytheiradherencetothefairprinciplesandtheoutlookforthefuture AT kennedypaulj areviewofthemachinelearningdatasetsinmammographytheiradherencetothefairprinciplesandtheoutlookforthefuture AT catchpooledaniel areviewofthemachinelearningdatasetsinmammographytheiradherencetothefairprinciplesandtheoutlookforthefuture AT loganjoe reviewofthemachinelearningdatasetsinmammographytheiradherencetothefairprinciplesandtheoutlookforthefuture AT kennedypaulj reviewofthemachinelearningdatasetsinmammographytheiradherencetothefairprinciplesandtheoutlookforthefuture AT catchpooledaniel reviewofthemachinelearningdatasetsinmammographytheiradherencetothefairprinciplesandtheoutlookforthefuture

A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future

Ejemplares similares