Cargando…

Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification

PURPOSE: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly whe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Whitney, Heather M., Drukker, Karen, Giger, Maryellen L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Society of Photo-Optical Instrumentation Engineers 2022
Materias:	Image Perception, Observer Performance, and Technology Assessment
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9152992/ https://www.ncbi.nlm.nih.gov/pubmed/35656541 http://dx.doi.org/10.1117/1.JMI.9.3.035502

_version_	1784717756826910720
author	Whitney, Heather M. Drukker, Karen Giger, Maryellen L.
author_facet	Whitney, Heather M. Drukker, Karen Giger, Maryellen L.
author_sort	Whitney, Heather M.
collection	PubMed
description	PURPOSE: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly when a specific operating point is used. APPROACH: The proposed performance metric curves (PMCs) simultaneously assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and the 95% confidence intervals thereof, as a function of the threshold for the decision variable. We investigated the utility of PMCs using six example operating points associated with commonly used methods to select operating points (including the Youden index and maximum mutual information). As an example, we applied PMCs to the task of distinguishing between malignant and benign breast lesions using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance images. The dataset had 1885 lesions, with the images acquired in 2015 and 2016 serving as the training set (1450 lesions) and those acquired in 2017 as the test set (435 lesions). Our study used this dataset in two ways: (1) the clinical dataset itself and (2) simulated datasets with features based on the clinical set but with five different disease prevalences. The median and 95% CI of the number of type I (false positive) and type II (false negative) errors were determined for each operating point of interest. RESULTS: PMCs from both the clinical and simulated datasets demonstrated that PMCs could support interpretation of the impact of decision threshold choice on type I and type II errors of classification, particularly relevant to prevalence. CONCLUSION: PMCs allow simultaneous evaluation of the four performance metrics of sensitivity, specificity, PPV, and NPV as a function of the decision threshold. This may create a better understanding of two-class classifier performance in machine learning.
format	Online Article Text
id	pubmed-9152992
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Society of Photo-Optical Instrumentation Engineers
record_format	MEDLINE/PubMed
spelling	pubmed-91529922023-05-31 Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification Whitney, Heather M. Drukker, Karen Giger, Maryellen L. J Med Imaging (Bellingham) Image Perception, Observer Performance, and Technology Assessment PURPOSE: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly when a specific operating point is used. APPROACH: The proposed performance metric curves (PMCs) simultaneously assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and the 95% confidence intervals thereof, as a function of the threshold for the decision variable. We investigated the utility of PMCs using six example operating points associated with commonly used methods to select operating points (including the Youden index and maximum mutual information). As an example, we applied PMCs to the task of distinguishing between malignant and benign breast lesions using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance images. The dataset had 1885 lesions, with the images acquired in 2015 and 2016 serving as the training set (1450 lesions) and those acquired in 2017 as the test set (435 lesions). Our study used this dataset in two ways: (1) the clinical dataset itself and (2) simulated datasets with features based on the clinical set but with five different disease prevalences. The median and 95% CI of the number of type I (false positive) and type II (false negative) errors were determined for each operating point of interest. RESULTS: PMCs from both the clinical and simulated datasets demonstrated that PMCs could support interpretation of the impact of decision threshold choice on type I and type II errors of classification, particularly relevant to prevalence. CONCLUSION: PMCs allow simultaneous evaluation of the four performance metrics of sensitivity, specificity, PPV, and NPV as a function of the decision threshold. This may create a better understanding of two-class classifier performance in machine learning. Society of Photo-Optical Instrumentation Engineers 2022-05-31 2022-05 /pmc/articles/PMC9152992/ /pubmed/35656541 http://dx.doi.org/10.1117/1.JMI.9.3.035502 Text en © 2022 Society of Photo-Optical Instrumentation Engineers (SPIE)
spellingShingle	Image Perception, Observer Performance, and Technology Assessment Whitney, Heather M. Drukker, Karen Giger, Maryellen L. Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title	Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_full	Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_fullStr	Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_full_unstemmed	Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_short	Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_sort	performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
topic	Image Perception, Observer Performance, and Technology Assessment
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9152992/ https://www.ncbi.nlm.nih.gov/pubmed/35656541 http://dx.doi.org/10.1117/1.JMI.9.3.035502
work_keys_str_mv	AT whitneyheatherm performancemetriccurveanalysisframeworktoassessimpactofthedecisionvariablethresholddiseaseprevalenceanddatasetvariabilityintwoclassclassification AT drukkerkaren performancemetriccurveanalysisframeworktoassessimpactofthedecisionvariablethresholddiseaseprevalenceanddatasetvariabilityintwoclassclassification AT gigermaryellenl performancemetriccurveanalysisframeworktoassessimpactofthedecisionvariablethresholddiseaseprevalenceanddatasetvariabilityintwoclassclassification

Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification

Ejemplares similares