Cargando…

Development of an algorithm for finding pertussis episodes in a population-based electronic health record database

While tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the burden of pertussis in this population remains under-recognized mainly due to atypical disease presentation, undermining efforts to optimize protect...

Descripción completa

Detalles Bibliográficos
Autores principales: Daluwatte, Chathuri, Dvaretskaya, Maryia, Ekhtiari, Sam, Hayat, Paul, Montmerle, Martin, Mathur, Sachin, Macina, Denis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Taylor & Francis 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184588/
https://www.ncbi.nlm.nih.gov/pubmed/37171155
http://dx.doi.org/10.1080/21645515.2023.2209455
_version_ 1785042176055443456
author Daluwatte, Chathuri
Dvaretskaya, Maryia
Ekhtiari, Sam
Hayat, Paul
Montmerle, Martin
Mathur, Sachin
Macina, Denis
author_facet Daluwatte, Chathuri
Dvaretskaya, Maryia
Ekhtiari, Sam
Hayat, Paul
Montmerle, Martin
Mathur, Sachin
Macina, Denis
author_sort Daluwatte, Chathuri
collection PubMed
description While tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the burden of pertussis in this population remains under-recognized mainly due to atypical disease presentation, undermining efforts to optimize protection through vaccination. We developed a machine learning algorithm to identify undiagnosed/misdiagnosed pertussis episodes in patients diagnosed with acute respiratory disease (ARD) using signs, diseases and symptoms from clinician notes and demographic information within electronic health-care records (Optum Humedica repository [2007–2019]). We used two patient cohorts aged ≥11 years to develop the model: a positive pertussis cohort (4,515 episodes in 4,316 patients) and a negative pertussis (ARD) cohort (4,573,445 episodes and patients), defined using ICD 9/10 codes. To improve contrast between positive pertussis and negative pertussis (ARD) episodes, only episodes with ≥7 symptoms were selected. LightGBM was used as the machine learning model for pertussis episode identification. Model validity was determined using laboratory-confirmed pertussis positive and negative cohorts. Model explainability was obtained using the Shapley additive explanations method. The predictive performance was as follows: area under the precision–recall curve, 0.24 (SD, 7 × 10(−3)); recall, 0.72 (SD, 4 × 10(−3)); precision, 0.012 (SD, 1 × 10(−3)); and specificity, 0.94 (SD, 7 × 10(−3)). The model applied to laboratory-confirmed positive and negative pertussis episodes had a specificity of 0.846. Predictive probability for pertussis increased with presence of whooping cough, whoop, and post-tussive vomiting in clinician notes, but decreased with gastrointestinal bleeding, sepsis, pulmonary symptoms, and fever. In conclusion, machine learning can help identify pertussis episodes among those diagnosed with ARD.
format Online
Article
Text
id pubmed-10184588
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Taylor & Francis
record_format MEDLINE/PubMed
spelling pubmed-101845882023-05-16 Development of an algorithm for finding pertussis episodes in a population-based electronic health record database Daluwatte, Chathuri Dvaretskaya, Maryia Ekhtiari, Sam Hayat, Paul Montmerle, Martin Mathur, Sachin Macina, Denis Hum Vaccin Immunother Technology While tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the burden of pertussis in this population remains under-recognized mainly due to atypical disease presentation, undermining efforts to optimize protection through vaccination. We developed a machine learning algorithm to identify undiagnosed/misdiagnosed pertussis episodes in patients diagnosed with acute respiratory disease (ARD) using signs, diseases and symptoms from clinician notes and demographic information within electronic health-care records (Optum Humedica repository [2007–2019]). We used two patient cohorts aged ≥11 years to develop the model: a positive pertussis cohort (4,515 episodes in 4,316 patients) and a negative pertussis (ARD) cohort (4,573,445 episodes and patients), defined using ICD 9/10 codes. To improve contrast between positive pertussis and negative pertussis (ARD) episodes, only episodes with ≥7 symptoms were selected. LightGBM was used as the machine learning model for pertussis episode identification. Model validity was determined using laboratory-confirmed pertussis positive and negative cohorts. Model explainability was obtained using the Shapley additive explanations method. The predictive performance was as follows: area under the precision–recall curve, 0.24 (SD, 7 × 10(−3)); recall, 0.72 (SD, 4 × 10(−3)); precision, 0.012 (SD, 1 × 10(−3)); and specificity, 0.94 (SD, 7 × 10(−3)). The model applied to laboratory-confirmed positive and negative pertussis episodes had a specificity of 0.846. Predictive probability for pertussis increased with presence of whooping cough, whoop, and post-tussive vomiting in clinician notes, but decreased with gastrointestinal bleeding, sepsis, pulmonary symptoms, and fever. In conclusion, machine learning can help identify pertussis episodes among those diagnosed with ARD. Taylor & Francis 2023-05-12 /pmc/articles/PMC10184588/ /pubmed/37171155 http://dx.doi.org/10.1080/21645515.2023.2209455 Text en © 2023 Sanofi US Inc. Published with license by Taylor & Francis Group, LLC. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
spellingShingle Technology
Daluwatte, Chathuri
Dvaretskaya, Maryia
Ekhtiari, Sam
Hayat, Paul
Montmerle, Martin
Mathur, Sachin
Macina, Denis
Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
title Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
title_full Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
title_fullStr Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
title_full_unstemmed Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
title_short Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
title_sort development of an algorithm for finding pertussis episodes in a population-based electronic health record database
topic Technology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184588/
https://www.ncbi.nlm.nih.gov/pubmed/37171155
http://dx.doi.org/10.1080/21645515.2023.2209455
work_keys_str_mv AT daluwattechathuri developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase
AT dvaretskayamaryia developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase
AT ekhtiarisam developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase
AT hayatpaul developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase
AT montmerlemartin developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase
AT mathursachin developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase
AT macinadenis developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase