Cargando…

The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation

BACKGROUND: Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of subop...

Descripción completa

Detalles Bibliográficos
Autores principales: van Doorn, S., Brakenhoff, T. B., Moons, K. G. M., Rutten, F. H., Hoes, A. W., Groenwold, R. H. H., Geersing, G. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6460749/
https://www.ncbi.nlm.nih.gov/pubmed/31093547
http://dx.doi.org/10.1186/s41512-017-0018-x
_version_ 1783410375056687104
author van Doorn, S.
Brakenhoff, T. B.
Moons, K. G. M.
Rutten, F. H.
Hoes, A. W.
Groenwold, R. H. H.
Geersing, G. J.
author_facet van Doorn, S.
Brakenhoff, T. B.
Moons, K. G. M.
Rutten, F. H.
Hoes, A. W.
Groenwold, R. H. H.
Geersing, G. J.
author_sort van Doorn, S.
collection PubMed
description BACKGROUND: Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear. We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF). METHODS: In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen’s kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models. RESULTS: In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3–3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen’s kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration. CONCLUSION: In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s41512-017-0018-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6460749
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64607492019-05-15 The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation van Doorn, S. Brakenhoff, T. B. Moons, K. G. M. Rutten, F. H. Hoes, A. W. Groenwold, R. H. H. Geersing, G. J. Diagn Progn Res Research BACKGROUND: Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear. We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF). METHODS: In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen’s kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models. RESULTS: In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3–3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen’s kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration. CONCLUSION: In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s41512-017-0018-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-11-16 /pmc/articles/PMC6460749/ /pubmed/31093547 http://dx.doi.org/10.1186/s41512-017-0018-x Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
van Doorn, S.
Brakenhoff, T. B.
Moons, K. G. M.
Rutten, F. H.
Hoes, A. W.
Groenwold, R. H. H.
Geersing, G. J.
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_full The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_fullStr The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_full_unstemmed The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_short The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_sort effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the cha2ds2-vasc score in atrial fibrillation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6460749/
https://www.ncbi.nlm.nih.gov/pubmed/31093547
http://dx.doi.org/10.1186/s41512-017-0018-x
work_keys_str_mv AT vandoorns theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT brakenhofftb theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT moonskgm theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT ruttenfh theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT hoesaw theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT groenwoldrhh theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT geersinggj theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT vandoorns effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT brakenhofftb effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT moonskgm effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT ruttenfh effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT hoesaw effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT groenwoldrhh effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT geersinggj effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation