Cargando…

Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study

Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for prac...

Descripción completa

Detalles Bibliográficos
Autores principales:	Falconieri, Nora, Van Calster, Ben, Timmerman, Dirk, Wynants, Laure
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2020
Materias:	Risk Prediction
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7383814/ https://www.ncbi.nlm.nih.gov/pubmed/31957077 http://dx.doi.org/10.1002/bimj.201900075

_version_	1783563495515619328
author	Falconieri, Nora Van Calster, Ben Timmerman, Dirk Wynants, Laure
author_facet	Falconieri, Nora Van Calster, Ben Timmerman, Dirk Wynants, Laure
author_sort	Falconieri, Nora
collection	PubMed
description	Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center‐specific intercepts, the presence of a center‐predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center‐specific intercepts were not normally distributed, a center‐predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.
format	Online Article Text
id	pubmed-7383814
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-73838142020-07-27 Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study Falconieri, Nora Van Calster, Ben Timmerman, Dirk Wynants, Laure Biom J Risk Prediction Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center‐specific intercepts, the presence of a center‐predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center‐specific intercepts were not normally distributed, a center‐predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression. John Wiley and Sons Inc. 2020-01-20 2020-07 /pmc/articles/PMC7383814/ /pubmed/31957077 http://dx.doi.org/10.1002/bimj.201900075 Text en © 2020 The Authors. Biometrical Journal published by WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Risk Prediction Falconieri, Nora Van Calster, Ben Timmerman, Dirk Wynants, Laure Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
title	Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
title_full	Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
title_fullStr	Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
title_full_unstemmed	Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
title_short	Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
title_sort	developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: a simulation study
topic	Risk Prediction
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7383814/ https://www.ncbi.nlm.nih.gov/pubmed/31957077 http://dx.doi.org/10.1002/bimj.201900075
work_keys_str_mv	AT falconierinora developingriskmodelsformulticenterdatausingstandardlogisticregressionproducedsuboptimalpredictionsasimulationstudy AT vancalsterben developingriskmodelsformulticenterdatausingstandardlogisticregressionproducedsuboptimalpredictionsasimulationstudy AT timmermandirk developingriskmodelsformulticenterdatausingstandardlogisticregressionproducedsuboptimalpredictionsasimulationstudy AT wynantslaure developingriskmodelsformulticenterdatausingstandardlogisticregressionproducedsuboptimalpredictionsasimulationstudy

Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study

Ejemplares similares