Cargando…
Development of an algorithm for finding pertussis episodes in a population-based electronic health record database
While tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the burden of pertussis in this population remains under-recognized mainly due to atypical disease presentation, undermining efforts to optimize protect...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Taylor & Francis
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184588/ https://www.ncbi.nlm.nih.gov/pubmed/37171155 http://dx.doi.org/10.1080/21645515.2023.2209455 |
_version_ | 1785042176055443456 |
---|---|
author | Daluwatte, Chathuri Dvaretskaya, Maryia Ekhtiari, Sam Hayat, Paul Montmerle, Martin Mathur, Sachin Macina, Denis |
author_facet | Daluwatte, Chathuri Dvaretskaya, Maryia Ekhtiari, Sam Hayat, Paul Montmerle, Martin Mathur, Sachin Macina, Denis |
author_sort | Daluwatte, Chathuri |
collection | PubMed |
description | While tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the burden of pertussis in this population remains under-recognized mainly due to atypical disease presentation, undermining efforts to optimize protection through vaccination. We developed a machine learning algorithm to identify undiagnosed/misdiagnosed pertussis episodes in patients diagnosed with acute respiratory disease (ARD) using signs, diseases and symptoms from clinician notes and demographic information within electronic health-care records (Optum Humedica repository [2007–2019]). We used two patient cohorts aged ≥11 years to develop the model: a positive pertussis cohort (4,515 episodes in 4,316 patients) and a negative pertussis (ARD) cohort (4,573,445 episodes and patients), defined using ICD 9/10 codes. To improve contrast between positive pertussis and negative pertussis (ARD) episodes, only episodes with ≥7 symptoms were selected. LightGBM was used as the machine learning model for pertussis episode identification. Model validity was determined using laboratory-confirmed pertussis positive and negative cohorts. Model explainability was obtained using the Shapley additive explanations method. The predictive performance was as follows: area under the precision–recall curve, 0.24 (SD, 7 × 10(−3)); recall, 0.72 (SD, 4 × 10(−3)); precision, 0.012 (SD, 1 × 10(−3)); and specificity, 0.94 (SD, 7 × 10(−3)). The model applied to laboratory-confirmed positive and negative pertussis episodes had a specificity of 0.846. Predictive probability for pertussis increased with presence of whooping cough, whoop, and post-tussive vomiting in clinician notes, but decreased with gastrointestinal bleeding, sepsis, pulmonary symptoms, and fever. In conclusion, machine learning can help identify pertussis episodes among those diagnosed with ARD. |
format | Online Article Text |
id | pubmed-10184588 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Taylor & Francis |
record_format | MEDLINE/PubMed |
spelling | pubmed-101845882023-05-16 Development of an algorithm for finding pertussis episodes in a population-based electronic health record database Daluwatte, Chathuri Dvaretskaya, Maryia Ekhtiari, Sam Hayat, Paul Montmerle, Martin Mathur, Sachin Macina, Denis Hum Vaccin Immunother Technology While tetanus-diphtheria-acellular pertussis (Tdap) vaccines for adolescents and adults were licensed in 2005 and immunization strategies proposed, the burden of pertussis in this population remains under-recognized mainly due to atypical disease presentation, undermining efforts to optimize protection through vaccination. We developed a machine learning algorithm to identify undiagnosed/misdiagnosed pertussis episodes in patients diagnosed with acute respiratory disease (ARD) using signs, diseases and symptoms from clinician notes and demographic information within electronic health-care records (Optum Humedica repository [2007–2019]). We used two patient cohorts aged ≥11 years to develop the model: a positive pertussis cohort (4,515 episodes in 4,316 patients) and a negative pertussis (ARD) cohort (4,573,445 episodes and patients), defined using ICD 9/10 codes. To improve contrast between positive pertussis and negative pertussis (ARD) episodes, only episodes with ≥7 symptoms were selected. LightGBM was used as the machine learning model for pertussis episode identification. Model validity was determined using laboratory-confirmed pertussis positive and negative cohorts. Model explainability was obtained using the Shapley additive explanations method. The predictive performance was as follows: area under the precision–recall curve, 0.24 (SD, 7 × 10(−3)); recall, 0.72 (SD, 4 × 10(−3)); precision, 0.012 (SD, 1 × 10(−3)); and specificity, 0.94 (SD, 7 × 10(−3)). The model applied to laboratory-confirmed positive and negative pertussis episodes had a specificity of 0.846. Predictive probability for pertussis increased with presence of whooping cough, whoop, and post-tussive vomiting in clinician notes, but decreased with gastrointestinal bleeding, sepsis, pulmonary symptoms, and fever. In conclusion, machine learning can help identify pertussis episodes among those diagnosed with ARD. Taylor & Francis 2023-05-12 /pmc/articles/PMC10184588/ /pubmed/37171155 http://dx.doi.org/10.1080/21645515.2023.2209455 Text en © 2023 Sanofi US Inc. Published with license by Taylor & Francis Group, LLC. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent. |
spellingShingle | Technology Daluwatte, Chathuri Dvaretskaya, Maryia Ekhtiari, Sam Hayat, Paul Montmerle, Martin Mathur, Sachin Macina, Denis Development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
title | Development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
title_full | Development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
title_fullStr | Development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
title_full_unstemmed | Development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
title_short | Development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
title_sort | development of an algorithm for finding pertussis episodes in a population-based electronic health record database |
topic | Technology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10184588/ https://www.ncbi.nlm.nih.gov/pubmed/37171155 http://dx.doi.org/10.1080/21645515.2023.2209455 |
work_keys_str_mv | AT daluwattechathuri developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase AT dvaretskayamaryia developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase AT ekhtiarisam developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase AT hayatpaul developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase AT montmerlemartin developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase AT mathursachin developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase AT macinadenis developmentofanalgorithmforfindingpertussisepisodesinapopulationbasedelectronichealthrecorddatabase |