Cargando…

BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain

BACKGROUND: Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving i...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdelmageed, Nora, Löffler, Felicitas, Feddoul, Leila, Algergawy, Alsayed, Samuel, Sheeba, Gaikwad, Jitendra, Kazem, Anahita, König-Ries, Birgitta
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9836593/
https://www.ncbi.nlm.nih.gov/pubmed/36761617
http://dx.doi.org/10.3897/BDJ.10.e89481
_version_ 1784868902952501248
author Abdelmageed, Nora
Löffler, Felicitas
Feddoul, Leila
Algergawy, Alsayed
Samuel, Sheeba
Gaikwad, Jitendra
Kazem, Anahita
König-Ries, Birgitta
author_facet Abdelmageed, Nora
Löffler, Felicitas
Feddoul, Leila
Algergawy, Alsayed
Samuel, Sheeba
Gaikwad, Jitendra
Kazem, Anahita
König-Ries, Birgitta
author_sort Abdelmageed, Nora
collection PubMed
description BACKGROUND: Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora. NEW INFORMATION: In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora.
format Online
Article
Text
id pubmed-9836593
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-98365932023-02-08 BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain Abdelmageed, Nora Löffler, Felicitas Feddoul, Leila Algergawy, Alsayed Samuel, Sheeba Gaikwad, Jitendra Kazem, Anahita König-Ries, Birgitta Biodivers Data J Data Paper (Biosciences) BACKGROUND: Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora. NEW INFORMATION: In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora. Pensoft Publishers 2022-10-07 /pmc/articles/PMC9836593/ /pubmed/36761617 http://dx.doi.org/10.3897/BDJ.10.e89481 Text en Nora Abdelmageed, Felicitas Löffler, Leila Feddoul, Alsayed Algergawy, Sheeba Samuel, Jitendra Gaikwad, Anahita Kazem, Birgitta König-Ries https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Data Paper (Biosciences)
Abdelmageed, Nora
Löffler, Felicitas
Feddoul, Leila
Algergawy, Alsayed
Samuel, Sheeba
Gaikwad, Jitendra
Kazem, Anahita
König-Ries, Birgitta
BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_full BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_fullStr BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_full_unstemmed BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_short BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_sort biodivnere: gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
topic Data Paper (Biosciences)
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9836593/
https://www.ncbi.nlm.nih.gov/pubmed/36761617
http://dx.doi.org/10.3897/BDJ.10.e89481
work_keys_str_mv AT abdelmageednora biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT lofflerfelicitas biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT feddoulleila biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT algergawyalsayed biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT samuelsheeba biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT gaikwadjitendra biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT kazemanahita biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT konigriesbirgitta biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain