Cargando…

Extending TextAE for annotation of non-contiguous entities

Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or iden...

Descripción completa

Detalles Bibliográficos
Autores principales: Lever, Jake, Altman, Russ, Kim, Jin-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362949/
https://www.ncbi.nlm.nih.gov/pubmed/32634869
http://dx.doi.org/10.5808/GI.2020.18.2.e15
_version_ 1783559585556070400
author Lever, Jake
Altman, Russ
Kim, Jin-Dong
author_facet Lever, Jake
Altman, Russ
Kim, Jin-Dong
author_sort Lever, Jake
collection PubMed
description Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity “type 1 diabetes” in the phrase “type 1 and type 2 diabetes.” This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE’s existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems.
format Online
Article
Text
id pubmed-7362949
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-73629492020-07-23 Extending TextAE for annotation of non-contiguous entities Lever, Jake Altman, Russ Kim, Jin-Dong Genomics Inform Original Article Named entity recognition tools are used to identify mentions of biomedical entities in free text and are essential components of high-quality information retrieval and extraction systems. Without good entity recognition, methods will mislabel searched text and will miss important information or identify spurious text that will frustrate users. Most tools do not capture non-contiguous entities which are separate spans of text that together refer to an entity, e.g., the entity “type 1 diabetes” in the phrase “type 1 and type 2 diabetes.” This type is commonly found in biomedical texts, especially in lists, where multiple biomedical entities are named in shortened form to avoid repeating words. Most text annotation systems, that enable users to view and edit entity annotations, do not support non-contiguous entities. Therefore, experts cannot even visualize non-contiguous entities, let alone annotate them to build valuable datasets for machine learning methods. To combat this problem and as part of the BLAH6 hackathon, we extended the TextAE platform to allow visualization and annotation of non-contiguous entities. This enables users to add new subspans to existing entities by selecting additional text. We integrate this new functionality with TextAE’s existing editing functionality to allow easy changes to entity annotation and editing of relation annotations involving non-contiguous entities, with importing and exporting to the PubAnnotation format. Finally, we roughly quantify the problem across the entire accessible biomedical literature to highlight that there are a substantial number of non-contiguous entities that appear in lists that would be missed by most text mining systems. Korea Genome Organization 2020-06-15 /pmc/articles/PMC7362949/ /pubmed/32634869 http://dx.doi.org/10.5808/GI.2020.18.2.e15 Text en (c) 2020, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Lever, Jake
Altman, Russ
Kim, Jin-Dong
Extending TextAE for annotation of non-contiguous entities
title Extending TextAE for annotation of non-contiguous entities
title_full Extending TextAE for annotation of non-contiguous entities
title_fullStr Extending TextAE for annotation of non-contiguous entities
title_full_unstemmed Extending TextAE for annotation of non-contiguous entities
title_short Extending TextAE for annotation of non-contiguous entities
title_sort extending textae for annotation of non-contiguous entities
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362949/
https://www.ncbi.nlm.nih.gov/pubmed/32634869
http://dx.doi.org/10.5808/GI.2020.18.2.e15
work_keys_str_mv AT leverjake extendingtextaeforannotationofnoncontiguousentities
AT altmanruss extendingtextaeforannotationofnoncontiguousentities
AT kimjindong extendingtextaeforannotationofnoncontiguousentities