Cargando…
A generalizable data assembly algorithm for infectious disease outbreaks
During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-b...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327373/ https://www.ncbi.nlm.nih.gov/pubmed/34350393 http://dx.doi.org/10.1093/jamiaopen/ooab058 |
_version_ | 1783732061253664768 |
---|---|
author | Majumder, Maimuna S Rose, Sherri |
author_facet | Majumder, Maimuna S Rose, Sherri |
author_sort | Majumder, Maimuna S |
collection | PubMed |
description | During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-based, outbreak-related information and demonstrate its performance across 3 outbreaks. After developing an algorithm with regular expressions, we automatically curated data from health agencies via 3 information sources: formal reports, email newsletters, and Twitter. A validation data set was also curated manually for each outbreak, and an implementation process was presented for application to future outbreaks. When compared against the validation data sets, the overall cumulative missingness and misidentification of the algorithmically curated data were ≤2% and ≤1%, respectively, for all 3 outbreaks. Within the context of outbreak research, our work successfully addresses the need for generalizable tools that can transform text-based information into machine-readable data across varied information sources and infectious diseases. |
format | Online Article Text |
id | pubmed-8327373 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83273732021-08-03 A generalizable data assembly algorithm for infectious disease outbreaks Majumder, Maimuna S Rose, Sherri JAMIA Open Brief Communications During infectious disease outbreaks, health agencies often share text-based information about cases and deaths. This information is rarely machine-readable, thus creating challenges for outbreak researchers. Here, we introduce a generalizable data assembly algorithm that automatically curates text-based, outbreak-related information and demonstrate its performance across 3 outbreaks. After developing an algorithm with regular expressions, we automatically curated data from health agencies via 3 information sources: formal reports, email newsletters, and Twitter. A validation data set was also curated manually for each outbreak, and an implementation process was presented for application to future outbreaks. When compared against the validation data sets, the overall cumulative missingness and misidentification of the algorithmically curated data were ≤2% and ≤1%, respectively, for all 3 outbreaks. Within the context of outbreak research, our work successfully addresses the need for generalizable tools that can transform text-based information into machine-readable data across varied information sources and infectious diseases. Oxford University Press 2021-08-02 /pmc/articles/PMC8327373/ /pubmed/34350393 http://dx.doi.org/10.1093/jamiaopen/ooab058 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Brief Communications Majumder, Maimuna S Rose, Sherri A generalizable data assembly algorithm for infectious disease outbreaks |
title | A generalizable data assembly algorithm for infectious disease outbreaks |
title_full | A generalizable data assembly algorithm for infectious disease outbreaks |
title_fullStr | A generalizable data assembly algorithm for infectious disease outbreaks |
title_full_unstemmed | A generalizable data assembly algorithm for infectious disease outbreaks |
title_short | A generalizable data assembly algorithm for infectious disease outbreaks |
title_sort | generalizable data assembly algorithm for infectious disease outbreaks |
topic | Brief Communications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327373/ https://www.ncbi.nlm.nih.gov/pubmed/34350393 http://dx.doi.org/10.1093/jamiaopen/ooab058 |
work_keys_str_mv | AT majumdermaimunas ageneralizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks AT rosesherri ageneralizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks AT majumdermaimunas generalizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks AT rosesherri generalizabledataassemblyalgorithmforinfectiousdiseaseoutbreaks |