Cargando…
BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine
BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model perfor...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8280585/ https://www.ncbi.nlm.nih.gov/pubmed/34266499 http://dx.doi.org/10.1186/s13326-021-00247-z |
_version_ | 1783722664070742016 |
---|---|
author | Majewska, Olga Collins, Charlotte Baker, Simon Björne, Jari Brown, Susan Windisch Korhonen, Anna Palmer, Martha |
author_facet | Majewska, Olga Collins, Charlotte Baker, Simon Björne, Jari Brown, Susan Windisch Korhonen, Anna Palmer, Martha |
author_sort | Majewska, Olga |
collection | PubMed |
description | BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. RESULTS: We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. CONCLUSION: This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine. |
format | Online Article Text |
id | pubmed-8280585 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-82805852021-07-16 BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine Majewska, Olga Collins, Charlotte Baker, Simon Björne, Jari Brown, Susan Windisch Korhonen, Anna Palmer, Martha J Biomed Semantics Database BACKGROUND: Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. RESULTS: We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. CONCLUSION: This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine. BioMed Central 2021-07-15 /pmc/articles/PMC8280585/ /pubmed/34266499 http://dx.doi.org/10.1186/s13326-021-00247-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Database Majewska, Olga Collins, Charlotte Baker, Simon Björne, Jari Brown, Susan Windisch Korhonen, Anna Palmer, Martha BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine |
title | BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine |
title_full | BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine |
title_fullStr | BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine |
title_full_unstemmed | BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine |
title_short | BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine |
title_sort | bioverbnet: a large semantic-syntactic classification of verbs in biomedicine |
topic | Database |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8280585/ https://www.ncbi.nlm.nih.gov/pubmed/34266499 http://dx.doi.org/10.1186/s13326-021-00247-z |
work_keys_str_mv | AT majewskaolga bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine AT collinscharlotte bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine AT bakersimon bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine AT bjornejari bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine AT brownsusanwindisch bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine AT korhonenanna bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine AT palmermartha bioverbnetalargesemanticsyntacticclassificationofverbsinbiomedicine |