Cargando…

Classification performance bias between training and test sets in a limited mammography dataset

OBJECTIVES: To assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study. METHODS: Mammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n=400) and test c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hou, Rui, Lo, Joseph Y., Marks, Jeffrey R., Hwang, E. Shelley, Grimm, Lars J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980247/ https://www.ncbi.nlm.nih.gov/pubmed/36865183 http://dx.doi.org/10.1101/2023.02.15.23285985

_version_	1784899876593598464
author	Hou, Rui Lo, Joseph Y. Marks, Jeffrey R. Hwang, E. Shelley Grimm, Lars J.
author_facet	Hou, Rui Lo, Joseph Y. Marks, Jeffrey R. Hwang, E. Shelley Grimm, Lars J.
author_sort	Hou, Rui
collection	PubMed
description	OBJECTIVES: To assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study. METHODS: Mammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n=400) and test cases (n=300) forty times. For each split, cross-validation was used for training, followed by an assessment of the test set. Logistic regression with regularization and support vector machine were used as the machine learning classifiers. For each split and classifier type, multiple models were created based on radiomics and/or clinical features. RESULTS: Area under the curve (AUC) performances varied considerably across the different data splits (e.g., radiomics regression model: train 0.58–0.70, test 0.59–0.73). Performances for regression models showed a tradeoff where better training led to worse testing and vice versa. Cross-validation over all cases reduced this variability, but required samples of 500+ cases to yield representative estimates of performance. CONCLUSIONS: In medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate.
format	Online Article Text
id	pubmed-9980247
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-99802472023-03-03 Classification performance bias between training and test sets in a limited mammography dataset Hou, Rui Lo, Joseph Y. Marks, Jeffrey R. Hwang, E. Shelley Grimm, Lars J. medRxiv Article OBJECTIVES: To assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study. METHODS: Mammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n=400) and test cases (n=300) forty times. For each split, cross-validation was used for training, followed by an assessment of the test set. Logistic regression with regularization and support vector machine were used as the machine learning classifiers. For each split and classifier type, multiple models were created based on radiomics and/or clinical features. RESULTS: Area under the curve (AUC) performances varied considerably across the different data splits (e.g., radiomics regression model: train 0.58–0.70, test 0.59–0.73). Performances for regression models showed a tradeoff where better training led to worse testing and vice versa. Cross-validation over all cases reduced this variability, but required samples of 500+ cases to yield representative estimates of performance. CONCLUSIONS: In medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate. Cold Spring Harbor Laboratory 2023-02-23 /pmc/articles/PMC9980247/ /pubmed/36865183 http://dx.doi.org/10.1101/2023.02.15.23285985 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Hou, Rui Lo, Joseph Y. Marks, Jeffrey R. Hwang, E. Shelley Grimm, Lars J. Classification performance bias between training and test sets in a limited mammography dataset
title	Classification performance bias between training and test sets in a limited mammography dataset
title_full	Classification performance bias between training and test sets in a limited mammography dataset
title_fullStr	Classification performance bias between training and test sets in a limited mammography dataset
title_full_unstemmed	Classification performance bias between training and test sets in a limited mammography dataset
title_short	Classification performance bias between training and test sets in a limited mammography dataset
title_sort	classification performance bias between training and test sets in a limited mammography dataset
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980247/ https://www.ncbi.nlm.nih.gov/pubmed/36865183 http://dx.doi.org/10.1101/2023.02.15.23285985
work_keys_str_mv	AT hourui classificationperformancebiasbetweentrainingandtestsetsinalimitedmammographydataset AT lojosephy classificationperformancebiasbetweentrainingandtestsetsinalimitedmammographydataset AT marksjeffreyr classificationperformancebiasbetweentrainingandtestsetsinalimitedmammographydataset AT hwangeshelley classificationperformancebiasbetweentrainingandtestsetsinalimitedmammographydataset AT grimmlarsj classificationperformancebiasbetweentrainingandtestsetsinalimitedmammographydataset

Classification performance bias between training and test sets in a limited mammography dataset

Ejemplares similares