Cargando…

A generalizable data assembly algorithm for infectious disease outbreaks

During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-b...

Descripción completa

Detalles Bibliográficos
Autores principales: Majumder, Maimuna S, Rose, Sherri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327373/
https://www.ncbi.nlm.nih.gov/pubmed/34350393
http://dx.doi.org/10.1093/jamiaopen/ooab058
_version_ 1783732061253664768
author Majumder, Maimuna S
Rose, Sherri
author_facet Majumder, Maimuna S
Rose, Sherri
author_sort Majumder, Maimuna S
collection PubMed
description During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-based, outbreak-related information and demonstrate its performance across 3 outbreaks. After developing an algorithm with regular expressions, we automatically curated data from health agencies via 3 information sources: formal reports, email newsletters, and Twitter. A validation data set was also curated manually for each outbreak, and an implementation process was presented for application to future outbreaks. When compared against the validation data sets, the overall cumulative missingness and misidentification of the algorithmically curated data were ≤2% and ≤1%, respectively, for all 3 outbreaks. Within the context of outbreak research, our work successfully addresses the need for generalizable tools that can transform text-based information into machine-readable data across varied information sources and infectious diseases.
format Online
Article
Text
id pubmed-8327373
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83273732021-08-03 A generalizable data assembly algorithm for infectious disease outbreaks Majumder, Maimuna S Rose, Sherri JAMIA Open Brief Communications During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-based, outbreak-related information and demonstrate its performance across 3 outbreaks. After developing an algorithm with regular expressions, we automatically curated data from health agencies via 3 information sources: formal reports, email newsletters, and Twitter. A validation data set was also curated manually for each outbreak, and an implementation process was presented for application to future outbreaks. When compared against the validation data sets, the overall cumulative missingness and misidentification of the algorithmically curated data were ≤2% and ≤1%, respectively, for all 3 outbreaks. Within the context of outbreak research, our work successfully addresses the need for generalizable tools that can transform text-based information into machine-readable data across varied information sources and infectious diseases. Oxford University Press 2021-08-02 /pmc/articles/PMC8327373/ /pubmed/34350393 http://dx.doi.org/10.1093/jamiaopen/ooab058 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Brief Communications
Majumder, Maimuna S
Rose, Sherri
A generalizable data assembly algorithm for infectious disease outbreaks
title A generalizable data assembly algorithm for infectious disease outbreaks
title_full A generalizable data assembly algorithm for infectious disease outbreaks
title_fullStr A generalizable data assembly algorithm for infectious disease outbreaks
title_full_unstemmed A generalizable data assembly algorithm for infectious disease outbreaks
title_short A generalizable data assembly algorithm for infectious disease outbreaks
title_sort generalizable data assembly algorithm for infectious disease outbreaks
topic Brief Communications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327373/
https://www.ncbi.nlm.nih.gov/pubmed/34350393
http://dx.doi.org/10.1093/jamiaopen/ooab058
work_keys_str_mv AT majumdermaimunas ageneralizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks
AT rosesherri ageneralizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks
AT majumdermaimunas generalizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks
AT rosesherri generalizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks