Cargando…

PADI-web corpus: Labeled textual data in animal health domain

Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008) [3]. In this context, we developed PADI-web, a Platform for Automated extraction of ani...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rabatel, Julien, Arsevska, Elena, Roche, Mathieu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2018
Materias:	Computer Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6327737/ https://www.ncbi.nlm.nih.gov/pubmed/30671512 http://dx.doi.org/10.1016/j.dib.2018.12.063

_version_	1783386525058203648
author	Rabatel, Julien Arsevska, Elena Roche, Mathieu
author_facet	Rabatel, Julien Arsevska, Elena Roche, Mathieu
author_sort	Rabatel, Julien
collection	PubMed
description	Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008) [3]. In this context, we developed PADI-web, a Platform for Automated extraction of animal Disease Information from the Web (Arsevska et al., 2016, 2018). PADI-web is a text-mining tool that automatically detects, categorizes and extracts disease outbreak information from Web news articles. PADI-web currently monitors the Web for five emerging animal infectious diseases, i.e., African swine fever, avian influenza including highly pathogenic and low pathogenic avian influenza, foot-and-mouth disease, bluetongue, and Schmallenberg virus infection. PADI-web collects Web news articles in near-real time through RSS feeds. Currently, PADI-web collects disease information from Google News because of its international and multiple language coverage. We implemented machine learning techniques to identify the relevant disease information in texts (i.e., location and date of an outbreak, affected hosts, their numbers and clinical signs). In order to train the model for Information Extraction (IE) from news articles, a corpus in English has been manually labeled by domain experts. This labeled corpus (Rabatel et al., 2017) is presented in this data paper.
format	Online Article Text
id	pubmed-6327737
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-63277372019-01-22 PADI-web corpus: Labeled textual data in animal health domain Rabatel, Julien Arsevska, Elena Roche, Mathieu Data Brief Computer Science Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008) [3]. In this context, we developed PADI-web, a Platform for Automated extraction of animal Disease Information from the Web (Arsevska et al., 2016, 2018). PADI-web is a text-mining tool that automatically detects, categorizes and extracts disease outbreak information from Web news articles. PADI-web currently monitors the Web for five emerging animal infectious diseases, i.e., African swine fever, avian influenza including highly pathogenic and low pathogenic avian influenza, foot-and-mouth disease, bluetongue, and Schmallenberg virus infection. PADI-web collects Web news articles in near-real time through RSS feeds. Currently, PADI-web collects disease information from Google News because of its international and multiple language coverage. We implemented machine learning techniques to identify the relevant disease information in texts (i.e., location and date of an outbreak, affected hosts, their numbers and clinical signs). In order to train the model for Information Extraction (IE) from news articles, a corpus in English has been manually labeled by domain experts. This labeled corpus (Rabatel et al., 2017) is presented in this data paper. Elsevier 2018-12-23 /pmc/articles/PMC6327737/ /pubmed/30671512 http://dx.doi.org/10.1016/j.dib.2018.12.063 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Computer Science Rabatel, Julien Arsevska, Elena Roche, Mathieu PADI-web corpus: Labeled textual data in animal health domain
title	PADI-web corpus: Labeled textual data in animal health domain
title_full	PADI-web corpus: Labeled textual data in animal health domain
title_fullStr	PADI-web corpus: Labeled textual data in animal health domain
title_full_unstemmed	PADI-web corpus: Labeled textual data in animal health domain
title_short	PADI-web corpus: Labeled textual data in animal health domain
title_sort	padi-web corpus: labeled textual data in animal health domain
topic	Computer Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6327737/ https://www.ncbi.nlm.nih.gov/pubmed/30671512 http://dx.doi.org/10.1016/j.dib.2018.12.063
work_keys_str_mv	AT rabateljulien padiwebcorpuslabeledtextualdatainanimalhealthdomain AT arsevskaelena padiwebcorpuslabeledtextualdatainanimalhealthdomain AT rochemathieu padiwebcorpuslabeledtextualdatainanimalhealthdomain

PADI-web corpus: Labeled textual data in animal health domain

Ejemplares similares