Cargando…
A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
BACKGROUND: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. METH...
Autores principales: | , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810222/ https://www.ncbi.nlm.nih.gov/pubmed/36597534 http://dx.doi.org/10.1101/2022.12.22.22283791 |
_version_ | 1784863265838333952 |
---|---|
author | Lorman, Vitaly Razzaghi, Hanieh Song, Xing Morse, Keith Utidjian, Levon Allen, Andrea J. Rao, Suchitra Rogerson, Colin Bennett, Tellen D. Morizono, Hiroki Eckrich, Daniel Jhaveri, Ravi Huang, Yungui Ranade, Daksha Pajor, Nathan Lee, Grace M. Forrest, Christopher B. Bailey, L. Charles |
author_facet | Lorman, Vitaly Razzaghi, Hanieh Song, Xing Morse, Keith Utidjian, Levon Allen, Andrea J. Rao, Suchitra Rogerson, Colin Bennett, Tellen D. Morizono, Hiroki Eckrich, Daniel Jhaveri, Ravi Huang, Yungui Ranade, Daksha Pajor, Nathan Lee, Grace M. Forrest, Christopher B. Bailey, L. Charles |
author_sort | Lorman, Vitaly |
collection | PubMed |
description | BACKGROUND: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. METHODS AND FINDINGS: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. CONCLUSIONS: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses. |
format | Online Article Text |
id | pubmed-9810222 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-98102222023-01-04 A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program Lorman, Vitaly Razzaghi, Hanieh Song, Xing Morse, Keith Utidjian, Levon Allen, Andrea J. Rao, Suchitra Rogerson, Colin Bennett, Tellen D. Morizono, Hiroki Eckrich, Daniel Jhaveri, Ravi Huang, Yungui Ranade, Daksha Pajor, Nathan Lee, Grace M. Forrest, Christopher B. Bailey, L. Charles medRxiv Article BACKGROUND: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. METHODS AND FINDINGS: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. CONCLUSIONS: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses. Cold Spring Harbor Laboratory 2022-12-26 /pmc/articles/PMC9810222/ /pubmed/36597534 http://dx.doi.org/10.1101/2022.12.22.22283791 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Lorman, Vitaly Razzaghi, Hanieh Song, Xing Morse, Keith Utidjian, Levon Allen, Andrea J. Rao, Suchitra Rogerson, Colin Bennett, Tellen D. Morizono, Hiroki Eckrich, Daniel Jhaveri, Ravi Huang, Yungui Ranade, Daksha Pajor, Nathan Lee, Grace M. Forrest, Christopher B. Bailey, L. Charles A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program |
title | A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program |
title_full | A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program |
title_fullStr | A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program |
title_full_unstemmed | A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program |
title_short | A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program |
title_sort | machine learning-based phenotype for long covid in children: an ehr-based study from the recover program |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810222/ https://www.ncbi.nlm.nih.gov/pubmed/36597534 http://dx.doi.org/10.1101/2022.12.22.22283791 |
work_keys_str_mv | AT lormanvitaly amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT razzaghihanieh amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT songxing amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT morsekeith amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT utidjianlevon amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT allenandreaj amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT raosuchitra amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT rogersoncolin amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT bennetttellend amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT morizonohiroki amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT eckrichdaniel amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT jhaveriravi amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT huangyungui amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT ranadedaksha amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT pajornathan amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT leegracem amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT forrestchristopherb amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT baileylcharles amachinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT lormanvitaly machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT razzaghihanieh machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT songxing machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT morsekeith machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT utidjianlevon machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT allenandreaj machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT raosuchitra machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT rogersoncolin machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT bennetttellend machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT morizonohiroki machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT eckrichdaniel machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT jhaveriravi machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT huangyungui machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT ranadedaksha machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT pajornathan machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT leegracem machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT forrestchristopherb machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram AT baileylcharles machinelearningbasedphenotypeforlongcovidinchildrenanehrbasedstudyfromtherecoverprogram |