Cargando…
Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
BACKGROUND: In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780622/ https://www.ncbi.nlm.nih.gov/pubmed/36564712 http://dx.doi.org/10.1186/s12859-022-05094-y |
_version_ | 1784856876910903296 |
---|---|
author | Dellanzo, Antonella Cotik, Viviana Lozano Barriga, Daniel Yunior Mollapaza Apaza, Jonathan Jimmy Palomino, Daniel Schiaffino, Fernando Yanque Aliaga, Alexander Ochoa-Luna, José |
author_facet | Dellanzo, Antonella Cotik, Viviana Lozano Barriga, Daniel Yunior Mollapaza Apaza, Jonathan Jimmy Palomino, Daniel Schiaffino, Fernando Yanque Aliaga, Alexander Ochoa-Luna, José |
author_sort | Dellanzo, Antonella |
collection | PubMed |
description | BACKGROUND: In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems and focus on fast capturing of data to detect threat signals through channels other than traditional surveillance. In this study, we develop Natural Language Processing tools that can be used within EBS systems. In particular, we focus on information extraction techniques that enable digital surveillance to monitor Internet data and social media. RESULTS: We created an annotated Spanish corpus from ProMED-mail health reports regarding disease outbreaks in Latin America. The corpus has been used to train algorithms for two information extraction tasks: named entity recognition and relation extraction. The algorithms, based on deep learning and rules, have been applied to recognize diseases, hosts, and geographical locations where a disease is occurring, among other entities and relations. In addition, an in-depth analysis of micro-average F1 metrics shows the suitability of our approaches for both tasks. CONCLUSIONS: The annotated corpus and algorithms presented could leverage the development of automated tools for extracting information from news and health reports written in Spanish. Moreover, this framework could be useful within EBS systems to support the early detection of Latin American disease outbreaks. |
format | Online Article Text |
id | pubmed-9780622 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-97806222022-12-23 Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus Dellanzo, Antonella Cotik, Viviana Lozano Barriga, Daniel Yunior Mollapaza Apaza, Jonathan Jimmy Palomino, Daniel Schiaffino, Fernando Yanque Aliaga, Alexander Ochoa-Luna, José BMC Bioinformatics Research BACKGROUND: In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems and focus on fast capturing of data to detect threat signals through channels other than traditional surveillance. In this study, we develop Natural Language Processing tools that can be used within EBS systems. In particular, we focus on information extraction techniques that enable digital surveillance to monitor Internet data and social media. RESULTS: We created an annotated Spanish corpus from ProMED-mail health reports regarding disease outbreaks in Latin America. The corpus has been used to train algorithms for two information extraction tasks: named entity recognition and relation extraction. The algorithms, based on deep learning and rules, have been applied to recognize diseases, hosts, and geographical locations where a disease is occurring, among other entities and relations. In addition, an in-depth analysis of micro-average F1 metrics shows the suitability of our approaches for both tasks. CONCLUSIONS: The annotated corpus and algorithms presented could leverage the development of automated tools for extracting information from news and health reports written in Spanish. Moreover, this framework could be useful within EBS systems to support the early detection of Latin American disease outbreaks. BioMed Central 2022-12-23 /pmc/articles/PMC9780622/ /pubmed/36564712 http://dx.doi.org/10.1186/s12859-022-05094-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Dellanzo, Antonella Cotik, Viviana Lozano Barriga, Daniel Yunior Mollapaza Apaza, Jonathan Jimmy Palomino, Daniel Schiaffino, Fernando Yanque Aliaga, Alexander Ochoa-Luna, José Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus |
title | Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus |
title_full | Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus |
title_fullStr | Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus |
title_full_unstemmed | Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus |
title_short | Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus |
title_sort | digital surveillance in latin american diseases outbreaks: information extraction from a novel spanish corpus |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780622/ https://www.ncbi.nlm.nih.gov/pubmed/36564712 http://dx.doi.org/10.1186/s12859-022-05094-y |
work_keys_str_mv | AT dellanzoantonella digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT cotikviviana digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT lozanobarrigadanielyunior digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT mollapazaapazajonathanjimmy digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT palominodaniel digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT schiaffinofernando digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT yanquealiagaalexander digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus AT ochoalunajose digitalsurveillanceinlatinamericandiseasesoutbreaksinformationextractionfromanovelspanishcorpus |