Cargando…
414. Developing Digital Phenotypes of Primary Immune Deficiencies Using Machine Learning on a Large Electronic Health Record Database
BACKGROUND: More than 350 genetic disorders cause immune deficiencies; given the rarity of these conditions, in-depth study of infections associated with primary immune deficiencies (PID) requires extremely large sample sizes from broad populations. Using a large electronic health record (EHR) datas...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6809140/ http://dx.doi.org/10.1093/ofid/ofz360.487 |
Sumario: | BACKGROUND: More than 350 genetic disorders cause immune deficiencies; given the rarity of these conditions, in-depth study of infections associated with primary immune deficiencies (PID) requires extremely large sample sizes from broad populations. Using a large electronic health record (EHR) dataset, we linked clinical and microbiologic data to develop digital phenotypes for PID. METHODS: Using the Cerner HealthFacts EHR dataset from 2009 to 2017 we extracted clinical and microbiologic data for hospitalizations from patients <18 years old with ICD9/10 PID diagnoses and ≥1 positive culture for infection. Machine learning models were used to identify key features to predict PID diagnosis. Features included patient and hospitalization characteristics; infectious agent and infection site; and selected comorbidities. Model validation was done using the area under the receiver operating characteristic (AUC) curve. RESULTS: Overall 1316 patients with a PID were identified (Table 1). The 10 most common pathogens identified by PID are listed in Table 2. The models classified DiGeorge syndrome (positive predictive value 49%), functional disorders of polymorphonuclear neutrophils (PMN) (PPV 43%), and common variable immunodeficiency (CVID) (PPV 47%) better than combined immunodeficiency (CID) (PPV 20%); the overall true positive rate was 47% with an AUC of 0.73. Predictive features for each PID were as follows: CVID—having enteritis, hypertension, and pneumonia (Figure 1a); PMN—having hypoxia and hypertension (Figure 1b); DiGeorge syndrome—having congenital deformities and not having hypertension (Figure 1c); CID—finding Staphylococcus aureus in a wound or Escherichia coli in the blood were predictive of CID (Figure 1d). CONCLUSION: Early models demonstrate some discrimination, specifically for more common PIDs (CVID) and those with highly identifying factors (DiGeorge syndrome). These models can be improved by including a wider array of clinical data, and they provide a first look at a new methodology to digitally phenotype PIDs for future diagnostic use. [Image: see text] [Image: see text] [Image: see text] DISCLOSURES: All authors: No reported disclosures. |
---|