Cargando…

Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data

Research that seeks to compare two predictive models requires a thorough statistical approach to draw valid inferences about comparisons between the performance of the two models. Researchers present estimates of model performance with little evidence on whether they reflect true differences in mode...

Descripción completa

Detalles Bibliográficos
Autores principales: Nasejje, Justine B., Whata, Albert, Chimedza, Charles
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9797100/
https://www.ncbi.nlm.nih.gov/pubmed/36576910
http://dx.doi.org/10.1371/journal.pone.0279435
_version_ 1784860630207954944
author Nasejje, Justine B.
Whata, Albert
Chimedza, Charles
author_facet Nasejje, Justine B.
Whata, Albert
Chimedza, Charles
author_sort Nasejje, Justine B.
collection PubMed
description Research that seeks to compare two predictive models requires a thorough statistical approach to draw valid inferences about comparisons between the performance of the two models. Researchers present estimates of model performance with little evidence on whether they reflect true differences in model performance. In this study, we apply two statistical tests, that is, the 5 × 2-fold cv paired t-test, and the combined 5 × 2-fold cv F-test to provide statistical evidence on differences in predictive performance between the Fine-Gray (FG) and random survival forest (RSF) models for competing risks. These models are trained on different scenarios of low-dimensional simulated survival data to determine whether the differences in their predictive performance that exist are indeed significant. Each simulation was repeated one hundred times on ten different seeds. The results indicate that the RSF model is superior in predictive performance in the presence of complex relationships (quadratic and interactions) between the outcome and its predictors. The two statistical tests show that the differences in performance are significant in quadratic simulation but not significant in interaction simulations. The study has also revealed that the FG model is superior in predictive performance in linear simulations and its differences in predictive performance compared to the RSF model are significant. The combined 5 × 2-fold cv F-test has lower type I error rates compared to the 5 × 2-fold cv paired t-test.
format Online
Article
Text
id pubmed-9797100
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-97971002022-12-29 Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data Nasejje, Justine B. Whata, Albert Chimedza, Charles PLoS One Research Article Research that seeks to compare two predictive models requires a thorough statistical approach to draw valid inferences about comparisons between the performance of the two models. Researchers present estimates of model performance with little evidence on whether they reflect true differences in model performance. In this study, we apply two statistical tests, that is, the 5 × 2-fold cv paired t-test, and the combined 5 × 2-fold cv F-test to provide statistical evidence on differences in predictive performance between the Fine-Gray (FG) and random survival forest (RSF) models for competing risks. These models are trained on different scenarios of low-dimensional simulated survival data to determine whether the differences in their predictive performance that exist are indeed significant. Each simulation was repeated one hundred times on ten different seeds. The results indicate that the RSF model is superior in predictive performance in the presence of complex relationships (quadratic and interactions) between the outcome and its predictors. The two statistical tests show that the differences in performance are significant in quadratic simulation but not significant in interaction simulations. The study has also revealed that the FG model is superior in predictive performance in linear simulations and its differences in predictive performance compared to the RSF model are significant. The combined 5 × 2-fold cv F-test has lower type I error rates compared to the 5 × 2-fold cv paired t-test. Public Library of Science 2022-12-28 /pmc/articles/PMC9797100/ /pubmed/36576910 http://dx.doi.org/10.1371/journal.pone.0279435 Text en © 2022 Nasejje et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Nasejje, Justine B.
Whata, Albert
Chimedza, Charles
Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
title Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
title_full Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
title_fullStr Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
title_full_unstemmed Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
title_short Statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
title_sort statistical approaches to identifying significant differences in predictive performance between machine learning and classical statistical models for survival data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9797100/
https://www.ncbi.nlm.nih.gov/pubmed/36576910
http://dx.doi.org/10.1371/journal.pone.0279435
work_keys_str_mv AT nasejjejustineb statisticalapproachestoidentifyingsignificantdifferencesinpredictiveperformancebetweenmachinelearningandclassicalstatisticalmodelsforsurvivaldata
AT whataalbert statisticalapproachestoidentifyingsignificantdifferencesinpredictiveperformancebetweenmachinelearningandclassicalstatisticalmodelsforsurvivaldata
AT chimedzacharles statisticalapproachestoidentifyingsignificantdifferencesinpredictiveperformancebetweenmachinelearningandclassicalstatisticalmodelsforsurvivaldata