Cargando…

Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus

BACKGROUND: In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems...

Descripción completa

Detalles Bibliográficos
Autores principales: Dellanzo, Antonella, Cotik, Viviana, Lozano Barriga, Daniel Yunior, Mollapaza Apaza, Jonathan Jimmy, Palomino, Daniel, Schiaffino, Fernando, Yanque Aliaga, Alexander, Ochoa-Luna, José
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780622/
https://www.ncbi.nlm.nih.gov/pubmed/36564712
http://dx.doi.org/10.1186/s12859-022-05094-y
_version_ 1784856876910903296
author Dellanzo, Antonella
Cotik, Viviana
Lozano Barriga, Daniel Yunior
Mollapaza Apaza, Jonathan Jimmy
Palomino, Daniel
Schiaffino, Fernando
Yanque Aliaga, Alexander
Ochoa-Luna, José
author_facet Dellanzo, Antonella
Cotik, Viviana
Lozano Barriga, Daniel Yunior
Mollapaza Apaza, Jonathan Jimmy
Palomino, Daniel
Schiaffino, Fernando
Yanque Aliaga, Alexander
Ochoa-Luna, José
author_sort Dellanzo, Antonella
collection PubMed
description BACKGROUND: In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems and focus on fast capturing of data to detect threat signals through channels other than traditional surveillance. In this study, we develop Natural Language Processing tools that can be used within EBS systems. In particular, we focus on information extraction techniques that enable digital surveillance to monitor Internet data and social media. RESULTS: We created an annotated Spanish corpus from ProMED-mail health reports regarding disease outbreaks in Latin America. The corpus has been used to train algorithms for two information extraction tasks: named entity recognition and relation extraction. The algorithms, based on deep learning and rules, have been applied to recognize diseases, hosts, and geographical locations where a disease is occurring, among other entities and relations. In addition, an in-depth analysis of micro-average F1 metrics shows the suitability of our approaches for both tasks. CONCLUSIONS: The annotated corpus and algorithms presented could leverage the development of automated tools for extracting information from news and health reports written in Spanish. Moreover, this framework could be useful within EBS systems to support the early detection of Latin American disease outbreaks.
format Online
Article
Text
id pubmed-9780622
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97806222022-12-23 Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus Dellanzo, Antonella Cotik, Viviana Lozano Barriga, Daniel Yunior Mollapaza Apaza, Jonathan Jimmy Palomino, Daniel Schiaffino, Fernando Yanque Aliaga, Alexander Ochoa-Luna, José BMC Bioinformatics Research BACKGROUND: In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems and focus on fast capturing of data to detect threat signals through channels other than traditional surveillance. In this study, we develop Natural Language Processing tools that can be used within EBS systems. In particular, we focus on information extraction techniques that enable digital surveillance to monitor Internet data and social media. RESULTS: We created an annotated Spanish corpus from ProMED-mail health reports regarding disease outbreaks in Latin America. The corpus has been used to train algorithms for two information extraction tasks: named entity recognition and relation extraction. The algorithms, based on deep learning and rules, have been applied to recognize diseases, hosts, and geographical locations where a disease is occurring, among other entities and relations. In addition, an in-depth analysis of micro-average F1 metrics shows the suitability of our approaches for both tasks. CONCLUSIONS: The annotated corpus and algorithms presented could leverage the development of automated tools for extracting information from news and health reports written in Spanish. Moreover, this framework could be useful within EBS systems to support the early detection of Latin American disease outbreaks. BioMed Central 2022-12-23 /pmc/articles/PMC9780622/ /pubmed/36564712 http://dx.doi.org/10.1186/s12859-022-05094-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dellanzo, Antonella
Cotik, Viviana
Lozano Barriga, Daniel Yunior
Mollapaza Apaza, Jonathan Jimmy
Palomino, Daniel
Schiaffino, Fernando
Yanque Aliaga, Alexander
Ochoa-Luna, José
Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
title Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
title_full Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
title_fullStr Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
title_full_unstemmed Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
title_short Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
title_sort digital surveillance in latin american diseases outbreaks: information extraction from a novel spanish corpus
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780622/
https://www.ncbi.nlm.nih.gov/pubmed/36564712
http://dx.doi.org/10.1186/s12859-022-05094-y
work_keys_str_mv AT dellanzoantonella digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT cotikviviana digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT lozanobarrigadanielyunior digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT mollapazaapazajonathanjimmy digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT palominodaniel digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT schiaffinofernando digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT yanquealiagaalexander digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus
AT ochoalunajose digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus