Cargando…

NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities

MOTIVATION: This article describes NEREL-BIO—an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and...

Descripción completa

Detalles Bibliográficos
Autores principales: Loukachevitch, Natalia, Manandhar, Suresh, Baral, Elina, Rozhkov, Igor, Braslavski, Pavel, Ivanov, Vladimir, Batura, Tatiana, Tutubalina, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129873/
https://www.ncbi.nlm.nih.gov/pubmed/37004189
http://dx.doi.org/10.1093/bioinformatics/btad161
_version_ 1785030850021163008
author Loukachevitch, Natalia
Manandhar, Suresh
Baral, Elina
Rozhkov, Igor
Braslavski, Pavel
Ivanov, Vladimir
Batura, Tatiana
Tutubalina, Elena
author_facet Loukachevitch, Natalia
Manandhar, Suresh
Baral, Elina
Rozhkov, Igor
Braslavski, Pavel
Ivanov, Vladimir
Batura, Tatiana
Tutubalina, Elena
author_sort Loukachevitch, Natalia
collection PubMed
description MOTIVATION: This article describes NEREL-BIO—an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. RESULTS: NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL [Formula: see text] NEREL-BIO) and cross-language (English [Formula: see text] Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension models and report their results. AVAILABILITY AND IMPLEMENTATION: The dataset and annotation guidelines are freely available at https://github.com/nerel-ds/NEREL-BIO.
format Online
Article
Text
id pubmed-10129873
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101298732023-04-27 NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities Loukachevitch, Natalia Manandhar, Suresh Baral, Elina Rozhkov, Igor Braslavski, Pavel Ivanov, Vladimir Batura, Tatiana Tutubalina, Elena Bioinformatics Original Paper MOTIVATION: This article describes NEREL-BIO—an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. RESULTS: NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL [Formula: see text] NEREL-BIO) and cross-language (English [Formula: see text] Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension models and report their results. AVAILABILITY AND IMPLEMENTATION: The dataset and annotation guidelines are freely available at https://github.com/nerel-ds/NEREL-BIO. Oxford University Press 2023-04-02 /pmc/articles/PMC10129873/ /pubmed/37004189 http://dx.doi.org/10.1093/bioinformatics/btad161 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Loukachevitch, Natalia
Manandhar, Suresh
Baral, Elina
Rozhkov, Igor
Braslavski, Pavel
Ivanov, Vladimir
Batura, Tatiana
Tutubalina, Elena
NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities
title NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities
title_full NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities
title_fullStr NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities
title_full_unstemmed NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities
title_short NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities
title_sort nerel-bio: a dataset of biomedical abstracts annotated with nested named entities
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129873/
https://www.ncbi.nlm.nih.gov/pubmed/37004189
http://dx.doi.org/10.1093/bioinformatics/btad161
work_keys_str_mv AT loukachevitchnatalia nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT manandharsuresh nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT baralelina nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT rozhkovigor nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT braslavskipavel nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT ivanovvladimir nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT baturatatiana nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities
AT tutubalinaelena nerelbioadatasetofbiomedicalabstractsannotatedwithnestednamedentities