Cargando…

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

BACKGROUND: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. METH...

Descripción completa

Detalles Bibliográficos
Autores principales: Lorman, Vitaly, Razzaghi, Hanieh, Song, Xing, Morse, Keith, Utidjian, Levon, Allen, Andrea J., Rao, Suchitra, Rogerson, Colin, Bennett, Tellen D., Morizono, Hiroki, Eckrich, Daniel, Jhaveri, Ravi, Huang, Yungui, Ranade, Daksha, Pajor, Nathan, Lee, Grace M., Forrest, Christopher B., Bailey, L. Charles
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810222/
https://www.ncbi.nlm.nih.gov/pubmed/36597534
http://dx.doi.org/10.1101/2022.12.22.22283791
_version_ 1784863265838333952
author Lorman, Vitaly
Razzaghi, Hanieh
Song, Xing
Morse, Keith
Utidjian, Levon
Allen, Andrea J.
Rao, Suchitra
Rogerson, Colin
Bennett, Tellen D.
Morizono, Hiroki
Eckrich, Daniel
Jhaveri, Ravi
Huang, Yungui
Ranade, Daksha
Pajor, Nathan
Lee, Grace M.
Forrest, Christopher B.
Bailey, L. Charles
author_facet Lorman, Vitaly
Razzaghi, Hanieh
Song, Xing
Morse, Keith
Utidjian, Levon
Allen, Andrea J.
Rao, Suchitra
Rogerson, Colin
Bennett, Tellen D.
Morizono, Hiroki
Eckrich, Daniel
Jhaveri, Ravi
Huang, Yungui
Ranade, Daksha
Pajor, Nathan
Lee, Grace M.
Forrest, Christopher B.
Bailey, L. Charles
author_sort Lorman, Vitaly
collection PubMed
description BACKGROUND: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. METHODS AND FINDINGS: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. CONCLUSIONS: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.
format Online
Article
Text
id pubmed-9810222
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98102222023-01-04 A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program Lorman, Vitaly Razzaghi, Hanieh Song, Xing Morse, Keith Utidjian, Levon Allen, Andrea J. Rao, Suchitra Rogerson, Colin Bennett, Tellen D. Morizono, Hiroki Eckrich, Daniel Jhaveri, Ravi Huang, Yungui Ranade, Daksha Pajor, Nathan Lee, Grace M. Forrest, Christopher B. Bailey, L. Charles medRxiv Article BACKGROUND: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. METHODS AND FINDINGS: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. CONCLUSIONS: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses. Cold Spring Harbor Laboratory 2022-12-26 /pmc/articles/PMC9810222/ /pubmed/36597534 http://dx.doi.org/10.1101/2022.12.22.22283791 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Lorman, Vitaly
Razzaghi, Hanieh
Song, Xing
Morse, Keith
Utidjian, Levon
Allen, Andrea J.
Rao, Suchitra
Rogerson, Colin
Bennett, Tellen D.
Morizono, Hiroki
Eckrich, Daniel
Jhaveri, Ravi
Huang, Yungui
Ranade, Daksha
Pajor, Nathan
Lee, Grace M.
Forrest, Christopher B.
Bailey, L. Charles
A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
title A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
title_full A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
title_fullStr A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
title_full_unstemmed A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
title_short A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
title_sort machine learning-based phenotype for long covid in children: an ehr-based study from the recover program
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810222/
https://www.ncbi.nlm.nih.gov/pubmed/36597534
http://dx.doi.org/10.1101/2022.12.22.22283791
work_keys_str_mv AT lormanvitaly amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT razzaghihanieh amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT songxing amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT morsekeith amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT utidjianlevon amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT allenandreaj amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT raosuchitra amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT rogersoncolin amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT bennetttellend amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT morizonohiroki amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT eckrichdaniel amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT jhaveriravi amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT huangyungui amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT ranadedaksha amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT pajornathan amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT leegracem amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT forrestchristopherb amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT baileylcharles amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT lormanvitaly machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT razzaghihanieh machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT songxing machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT morsekeith machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT utidjianlevon machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT allenandreaj machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT raosuchitra machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT rogersoncolin machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT bennetttellend machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT morizonohiroki machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT eckrichdaniel machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT jhaveriravi machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT huangyungui machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT ranadedaksha machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT pajornathan machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT leegracem machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT forrestchristopherb machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram
AT baileylcharles machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram