Cargando…

A computational approach to compare regression modelling strategies in prediction research

BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clin...

Descripción completa

Detalles Bibliográficos
Autores principales: Pajouheshnia, Romin, Pestman, Wiebe R., Teerenstra, Steven, Groenwold, Rolf H. H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4997720/
https://www.ncbi.nlm.nih.gov/pubmed/27557642
http://dx.doi.org/10.1186/s12874-016-0209-0
_version_ 1782449829176147968
author Pajouheshnia, Romin
Pestman, Wiebe R.
Teerenstra, Steven
Groenwold, Rolf H. H.
author_facet Pajouheshnia, Romin
Pestman, Wiebe R.
Teerenstra, Steven
Groenwold, Rolf H. H.
author_sort Pajouheshnia, Romin
collection PubMed
description BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. METHODS: A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. RESULTS: The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. CONCLUSION: The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0209-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4997720
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49977202016-08-26 A computational approach to compare regression modelling strategies in prediction research Pajouheshnia, Romin Pestman, Wiebe R. Teerenstra, Steven Groenwold, Rolf H. H. BMC Med Res Methodol Research Article BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. METHODS: A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. RESULTS: The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. CONCLUSION: The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0209-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-25 /pmc/articles/PMC4997720/ /pubmed/27557642 http://dx.doi.org/10.1186/s12874-016-0209-0 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Pajouheshnia, Romin
Pestman, Wiebe R.
Teerenstra, Steven
Groenwold, Rolf H. H.
A computational approach to compare regression modelling strategies in prediction research
title A computational approach to compare regression modelling strategies in prediction research
title_full A computational approach to compare regression modelling strategies in prediction research
title_fullStr A computational approach to compare regression modelling strategies in prediction research
title_full_unstemmed A computational approach to compare regression modelling strategies in prediction research
title_short A computational approach to compare regression modelling strategies in prediction research
title_sort computational approach to compare regression modelling strategies in prediction research
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4997720/
https://www.ncbi.nlm.nih.gov/pubmed/27557642
http://dx.doi.org/10.1186/s12874-016-0209-0
work_keys_str_mv AT pajouheshniaromin acomputationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT pestmanwieber acomputationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT teerenstrasteven acomputationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT groenwoldrolfhh acomputationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT pajouheshniaromin computationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT pestmanwieber computationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT teerenstrasteven computationalapproachtocompareregressionmodellingstrategiesinpredictionresearch
AT groenwoldrolfhh computationalapproachtocompareregressionmodellingstrategiesinpredictionresearch