Cargando…

Validation of prediction models based on lasso regression with multiply imputed data

BACKGROUND: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Musoro, Jammbe Z, Zwinderman, Aeilko H, Puhan, Milo A, ter Riet, Gerben, Geskus, Ronald B
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4209042/ https://www.ncbi.nlm.nih.gov/pubmed/25323009 http://dx.doi.org/10.1186/1471-2288-14-116

_version_	1782341210927529984
author	Musoro, Jammbe Z Zwinderman, Aeilko H Puhan, Milo A ter Riet, Gerben Geskus, Ronald B
author_facet	Musoro, Jammbe Z Zwinderman, Aeilko H Puhan, Milo A ter Riet, Gerben Geskus, Ronald B
author_sort	Musoro, Jammbe Z
collection	PubMed
description	BACKGROUND: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data. METHOD: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI. RESULTS: The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger. CONCLUSION: Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2288-14-116) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4209042
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42090422014-10-28 Validation of prediction models based on lasso regression with multiply imputed data Musoro, Jammbe Z Zwinderman, Aeilko H Puhan, Milo A ter Riet, Gerben Geskus, Ronald B BMC Med Res Methodol Research Article BACKGROUND: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data. METHOD: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI. RESULTS: The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger. CONCLUSION: Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2288-14-116) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-16 /pmc/articles/PMC4209042/ /pubmed/25323009 http://dx.doi.org/10.1186/1471-2288-14-116 Text en © Musoro et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Musoro, Jammbe Z Zwinderman, Aeilko H Puhan, Milo A ter Riet, Gerben Geskus, Ronald B Validation of prediction models based on lasso regression with multiply imputed data
title	Validation of prediction models based on lasso regression with multiply imputed data
title_full	Validation of prediction models based on lasso regression with multiply imputed data
title_fullStr	Validation of prediction models based on lasso regression with multiply imputed data
title_full_unstemmed	Validation of prediction models based on lasso regression with multiply imputed data
title_short	Validation of prediction models based on lasso regression with multiply imputed data
title_sort	validation of prediction models based on lasso regression with multiply imputed data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4209042/ https://www.ncbi.nlm.nih.gov/pubmed/25323009 http://dx.doi.org/10.1186/1471-2288-14-116
work_keys_str_mv	AT musorojammbez validationofpredictionmodelsbasedonlassoregressionwithmultiplyimputeddata AT zwindermanaeilkoh validationofpredictionmodelsbasedonlassoregressionwithmultiplyimputeddata AT puhanmiloa validationofpredictionmodelsbasedonlassoregressionwithmultiplyimputeddata AT terrietgerben validationofpredictionmodelsbasedonlassoregressionwithmultiplyimputeddata AT geskusronaldb validationofpredictionmodelsbasedonlassoregressionwithmultiplyimputeddata

Validation of prediction models based on lasso regression with multiply imputed data

Ejemplares similares