Cargando…

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data

BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commo...

Descripción completa

Detalles Bibliográficos
Autores principales: Khalid, Sara, Yang, Cynthia, Blacketer, Clair, Duarte-Salles, Talita, Fernández-Bertolín, Sergio, Kim, Chungsoo, Park, Rae Woong, Park, Jimyung, Schuemie, Martijn J., Sena, Anthony G., Suchard, Marc A., You, Seng Chan, Rijnbeek, Peter R., Reps, Jenna M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Author(s). Published by Elsevier B.V. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8420135/
https://www.ncbi.nlm.nih.gov/pubmed/34560604
http://dx.doi.org/10.1016/j.cmpb.2021.106394
_version_ 1783748893022879744
author Khalid, Sara
Yang, Cynthia
Blacketer, Clair
Duarte-Salles, Talita
Fernández-Bertolín, Sergio
Kim, Chungsoo
Park, Rae Woong
Park, Jimyung
Schuemie, Martijn J.
Sena, Anthony G.
Suchard, Marc A.
You, Seng Chan
Rijnbeek, Peter R.
Reps, Jenna M.
author_facet Khalid, Sara
Yang, Cynthia
Blacketer, Clair
Duarte-Salles, Talita
Fernández-Bertolín, Sergio
Kim, Chungsoo
Park, Rae Woong
Park, Jimyung
Schuemie, Martijn J.
Sena, Anthony G.
Suchard, Marc A.
You, Seng Chan
Rijnbeek, Peter R.
Reps, Jenna M.
author_sort Khalid, Sara
collection PubMed
description BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). METHODS: We show step-by-step how to implement the analytics pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?’. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. RESULTS: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. CONCLUSION: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.
format Online
Article
Text
id pubmed-8420135
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Author(s). Published by Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-84201352021-09-07 A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data Khalid, Sara Yang, Cynthia Blacketer, Clair Duarte-Salles, Talita Fernández-Bertolín, Sergio Kim, Chungsoo Park, Rae Woong Park, Jimyung Schuemie, Martijn J. Sena, Anthony G. Suchard, Marc A. You, Seng Chan Rijnbeek, Peter R. Reps, Jenna M. Comput Methods Programs Biomed Article BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). METHODS: We show step-by-step how to implement the analytics pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?’. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. RESULTS: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. CONCLUSION: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world. The Author(s). Published by Elsevier B.V. 2021-11 2021-09-06 /pmc/articles/PMC8420135/ /pubmed/34560604 http://dx.doi.org/10.1016/j.cmpb.2021.106394 Text en © 2021 The Author(s). Published by Elsevier B.V. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Khalid, Sara
Yang, Cynthia
Blacketer, Clair
Duarte-Salles, Talita
Fernández-Bertolín, Sergio
Kim, Chungsoo
Park, Rae Woong
Park, Jimyung
Schuemie, Martijn J.
Sena, Anthony G.
Suchard, Marc A.
You, Seng Chan
Rijnbeek, Peter R.
Reps, Jenna M.
A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
title A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
title_full A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
title_fullStr A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
title_full_unstemmed A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
title_short A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
title_sort standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8420135/
https://www.ncbi.nlm.nih.gov/pubmed/34560604
http://dx.doi.org/10.1016/j.cmpb.2021.106394
work_keys_str_mv AT khalidsara astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT yangcynthia astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT blacketerclair astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT duartesallestalita astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT fernandezbertolinsergio astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT kimchungsoo astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT parkraewoong astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT parkjimyung astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT schuemiemartijnj astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT senaanthonyg astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT suchardmarca astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT yousengchan astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT rijnbeekpeterr astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT repsjennam astandardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT khalidsara standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT yangcynthia standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT blacketerclair standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT duartesallestalita standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT fernandezbertolinsergio standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT kimchungsoo standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT parkraewoong standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT parkjimyung standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT schuemiemartijnj standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT senaanthonyg standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT suchardmarca standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT yousengchan standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT rijnbeekpeterr standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata
AT repsjennam standardizedanalyticspipelineforreliableandrapiddevelopmentandvalidationofpredictionmodelsusingobservationalhealthdata