Cargando…

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model perfor...

Descripción completa

Detalles Bibliográficos
Autores principales: Majewska, Olga, Collins, Charlotte, Baker, Simon, Björne, Jari, Brown, Susan Windisch, Korhonen, Anna, Palmer, Martha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8280585/
https://www.ncbi.nlm.nih.gov/pubmed/34266499
http://dx.doi.org/10.1186/s13326-021-00247-z
_version_ 1783722664070742016
author Majewska, Olga
Collins, Charlotte
Baker, Simon
Björne, Jari
Brown, Susan Windisch
Korhonen, Anna
Palmer, Martha
author_facet Majewska, Olga
Collins, Charlotte
Baker, Simon
Björne, Jari
Brown, Susan Windisch
Korhonen, Anna
Palmer, Martha
author_sort Majewska, Olga
collection PubMed
description BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. RESULTS: We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. CONCLUSION: This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine.
format Online
Article
Text
id pubmed-8280585
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82805852021-07-16 BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine Majewska, Olga Collins, Charlotte Baker, Simon Björne, Jari Brown, Susan Windisch Korhonen, Anna Palmer, Martha J Biomed Semantics Database BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. RESULTS: We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. CONCLUSION: This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine. BioMed Central 2021-07-15 /pmc/articles/PMC8280585/ /pubmed/34266499 http://dx.doi.org/10.1186/s13326-021-00247-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Database
Majewska, Olga
Collins, Charlotte
Baker, Simon
Björne, Jari
Brown, Susan Windisch
Korhonen, Anna
Palmer, Martha
BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
title BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
title_full BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
title_fullStr BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
title_full_unstemmed BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
title_short BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
title_sort bioverbnet: a large semantic-syntactic classification of verbs in biomedicine
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8280585/
https://www.ncbi.nlm.nih.gov/pubmed/34266499
http://dx.doi.org/10.1186/s13326-021-00247-z
work_keys_str_mv AT majewskaolga bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine
AT collinscharlotte bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine
AT bakersimon bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine
AT bjornejari bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine
AT brownsusanwindisch bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine
AT korhonenanna bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine
AT palmermartha bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine