Cargando…

Improving Methods of Identifying Anaphylaxis for Medical Product Safety Surveillance Using Natural Language Processing and Machine Learning

We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or i...

Descripción completa

Detalles Bibliográficos
Autores principales: Carrell, David S, Gruber, Susan, Floyd, James S, Bann, Maralyssa A, Cushing-Haugen, Kara L, Johnson, Ron L, Graham, Vina, Cronkite, David J, Hazlehurst, Brian L, Felcher, Andrew H, Bejan, Cosmin A, Kennedy, Adee, Shinde, Mayura U, Karami, Sara, Ma, Yong, Stojanovic, Danijela, Zhao, Yueqin, Ball, Robert, Nelson, Jennifer C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9896464/
https://www.ncbi.nlm.nih.gov/pubmed/36331289
http://dx.doi.org/10.1093/aje/kwac182
Descripción
Sumario:We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015–2019 in 2 integrated health-care institutions in the Northwest United States. We used one site’s manually reviewed gold-standard outcomes data for model development and the other’s for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.