Cargando…

Labeled entities from social media data related to avian influenza disease

This dataset is composed by spatial (e.g. location) and thematic (e.g. diseases, symptoms, virus) entities concerning avian influenza in social media (textual) data in English. It was created from three corpora: the first one includes 10 transcriptions of YouTube videos and 70 tweets manually annota...

Descripción completa

Detalles Bibliográficos
Autores principales: Schaeffer, Camille, Interdonato, Roberto, Lancelot, Renaud, Roche, Mathieu, Teisseire, Maguelonne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184875/
https://www.ncbi.nlm.nih.gov/pubmed/35692611
http://dx.doi.org/10.1016/j.dib.2022.108317
_version_ 1784724625531338752
author Schaeffer, Camille
Interdonato, Roberto
Lancelot, Renaud
Roche, Mathieu
Teisseire, Maguelonne
author_facet Schaeffer, Camille
Interdonato, Roberto
Lancelot, Renaud
Roche, Mathieu
Teisseire, Maguelonne
author_sort Schaeffer, Camille
collection PubMed
description This dataset is composed by spatial (e.g. location) and thematic (e.g. diseases, symptoms, virus) entities concerning avian influenza in social media (textual) data in English. It was created from three corpora: the first one includes 10 transcriptions of YouTube videos and 70 tweets manually annotated. The second corpus is composed by the same textual data but automatically annotated with Named Entity Recognition (NER) tools. These two corpora have been built to evaluate NER tools and apply them to a bigger corpus. The third corpus is composed of 100 YouTube transcriptions automatically annotated with NER tools. The aim of the annotation task is to recognize spatial information such as the names of the cities and epidemiological information such as the names of the diseases. An annotation guideline is provided in order to ensure a unified annotation and to help the annotators. This dataset can be used to train or evaluate Natural Language Processing (NLP) approaches such as specialized entity recognition.
format Online
Article
Text
id pubmed-9184875
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-91848752022-06-11 Labeled entities from social media data related to avian influenza disease Schaeffer, Camille Interdonato, Roberto Lancelot, Renaud Roche, Mathieu Teisseire, Maguelonne Data Brief Data Article This dataset is composed by spatial (e.g. location) and thematic (e.g. diseases, symptoms, virus) entities concerning avian influenza in social media (textual) data in English. It was created from three corpora: the first one includes 10 transcriptions of YouTube videos and 70 tweets manually annotated. The second corpus is composed by the same textual data but automatically annotated with Named Entity Recognition (NER) tools. These two corpora have been built to evaluate NER tools and apply them to a bigger corpus. The third corpus is composed of 100 YouTube transcriptions automatically annotated with NER tools. The aim of the annotation task is to recognize spatial information such as the names of the cities and epidemiological information such as the names of the diseases. An annotation guideline is provided in order to ensure a unified annotation and to help the annotators. This dataset can be used to train or evaluate Natural Language Processing (NLP) approaches such as specialized entity recognition. Elsevier 2022-05-27 /pmc/articles/PMC9184875/ /pubmed/35692611 http://dx.doi.org/10.1016/j.dib.2022.108317 Text en © 2022 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Schaeffer, Camille
Interdonato, Roberto
Lancelot, Renaud
Roche, Mathieu
Teisseire, Maguelonne
Labeled entities from social media data related to avian influenza disease
title Labeled entities from social media data related to avian influenza disease
title_full Labeled entities from social media data related to avian influenza disease
title_fullStr Labeled entities from social media data related to avian influenza disease
title_full_unstemmed Labeled entities from social media data related to avian influenza disease
title_short Labeled entities from social media data related to avian influenza disease
title_sort labeled entities from social media data related to avian influenza disease
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184875/
https://www.ncbi.nlm.nih.gov/pubmed/35692611
http://dx.doi.org/10.1016/j.dib.2022.108317
work_keys_str_mv AT schaeffercamille labeledentitiesfromsocialmediadatarelatedtoavianinfluenzadisease
AT interdonatoroberto labeledentitiesfromsocialmediadatarelatedtoavianinfluenzadisease
AT lancelotrenaud labeledentitiesfromsocialmediadatarelatedtoavianinfluenzadisease
AT rochemathieu labeledentitiesfromsocialmediadatarelatedtoavianinfluenzadisease
AT teisseiremaguelonne labeledentitiesfromsocialmediadatarelatedtoavianinfluenzadisease