Cargando…

Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification

PURPOSE: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly whe...

Descripción completa

Detalles Bibliográficos
Autores principales: Whitney, Heather M., Drukker, Karen, Giger, Maryellen L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Society of Photo-Optical Instrumentation Engineers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9152992/
https://www.ncbi.nlm.nih.gov/pubmed/35656541
http://dx.doi.org/10.1117/1.JMI.9.3.035502
_version_ 1784717756826910720
author Whitney, Heather M.
Drukker, Karen
Giger, Maryellen L.
author_facet Whitney, Heather M.
Drukker, Karen
Giger, Maryellen L.
author_sort Whitney, Heather M.
collection PubMed
description PURPOSE: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly when a specific operating point is used. APPROACH: The proposed performance metric curves (PMCs) simultaneously assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and the 95% confidence intervals thereof, as a function of the threshold for the decision variable. We investigated the utility of PMCs using six example operating points associated with commonly used methods to select operating points (including the Youden index and maximum mutual information). As an example, we applied PMCs to the task of distinguishing between malignant and benign breast lesions using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance images. The dataset had 1885 lesions, with the images acquired in 2015 and 2016 serving as the training set (1450 lesions) and those acquired in 2017 as the test set (435 lesions). Our study used this dataset in two ways: (1) the clinical dataset itself and (2) simulated datasets with features based on the clinical set but with five different disease prevalences. The median and 95% CI of the number of type I (false positive) and type II (false negative) errors were determined for each operating point of interest. RESULTS: PMCs from both the clinical and simulated datasets demonstrated that PMCs could support interpretation of the impact of decision threshold choice on type I and type II errors of classification, particularly relevant to prevalence. CONCLUSION: PMCs allow simultaneous evaluation of the four performance metrics of sensitivity, specificity, PPV, and NPV as a function of the decision threshold. This may create a better understanding of two-class classifier performance in machine learning.
format Online
Article
Text
id pubmed-9152992
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Society of Photo-Optical Instrumentation Engineers
record_format MEDLINE/PubMed
spelling pubmed-91529922023-05-31 Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification Whitney, Heather M. Drukker, Karen Giger, Maryellen L. J Med Imaging (Bellingham) Image Perception, Observer Performance, and Technology Assessment PURPOSE: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly when a specific operating point is used. APPROACH: The proposed performance metric curves (PMCs) simultaneously assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and the 95% confidence intervals thereof, as a function of the threshold for the decision variable. We investigated the utility of PMCs using six example operating points associated with commonly used methods to select operating points (including the Youden index and maximum mutual information). As an example, we applied PMCs to the task of distinguishing between malignant and benign breast lesions using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance images. The dataset had 1885 lesions, with the images acquired in 2015 and 2016 serving as the training set (1450 lesions) and those acquired in 2017 as the test set (435 lesions). Our study used this dataset in two ways: (1) the clinical dataset itself and (2) simulated datasets with features based on the clinical set but with five different disease prevalences. The median and 95% CI of the number of type I (false positive) and type II (false negative) errors were determined for each operating point of interest. RESULTS: PMCs from both the clinical and simulated datasets demonstrated that PMCs could support interpretation of the impact of decision threshold choice on type I and type II errors of classification, particularly relevant to prevalence. CONCLUSION: PMCs allow simultaneous evaluation of the four performance metrics of sensitivity, specificity, PPV, and NPV as a function of the decision threshold. This may create a better understanding of two-class classifier performance in machine learning. Society of Photo-Optical Instrumentation Engineers 2022-05-31 2022-05 /pmc/articles/PMC9152992/ /pubmed/35656541 http://dx.doi.org/10.1117/1.JMI.9.3.035502 Text en © 2022 Society of Photo-Optical Instrumentation Engineers (SPIE)
spellingShingle Image Perception, Observer Performance, and Technology Assessment
Whitney, Heather M.
Drukker, Karen
Giger, Maryellen L.
Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_full Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_fullStr Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_full_unstemmed Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_short Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
title_sort performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification
topic Image Perception, Observer Performance, and Technology Assessment
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9152992/
https://www.ncbi.nlm.nih.gov/pubmed/35656541
http://dx.doi.org/10.1117/1.JMI.9.3.035502
work_keys_str_mv AT whitneyheatherm performancemetriccurveanalysisframeworktoassessimpactofthedecisionvariablethresholddiseaseprevalenceanddatasetvariabilityintwoclassclassification
AT drukkerkaren performancemetriccurveanalysisframeworktoassessimpactofthedecisionvariablethresholddiseaseprevalenceanddatasetvariabilityintwoclassclassification
AT gigermaryellenl performancemetriccurveanalysisframeworktoassessimpactofthedecisionvariablethresholddiseaseprevalenceanddatasetvariabilityintwoclassclassification