Cargando…
Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation e...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408061/ https://www.ncbi.nlm.nih.gov/pubmed/36006843 http://dx.doi.org/10.1093/database/baac070 |
_version_ | 1784774515091308544 |
---|---|
author | Tang, Anfu Deléger, Louise Bossy, Robert Zweigenbaum, Pierre Nédellec, Claire |
author_facet | Tang, Anfu Deléger, Louise Bossy, Robert Zweigenbaum, Pierre Nédellec, Claire |
author_sort | Tang, Anfu |
collection | PubMed |
description | Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation extraction (RE) from biomedical texts very challenging. In this paper, we hypothesize that enriching a pre-trained transformer model with syntactic information may help improve its performance on chemical–drug RE tasks. For this purpose, we propose three syntax-enhanced models based on the domain-specific BioBERT model: Chunking-Enhanced-BioBERT and Constituency-Tree-BioBERT in which constituency information is integrated and a Multi-Task-Learning framework Multi-Task-Syntactic (MTS)-BioBERT in which syntactic information is injected implicitly by adding syntax-related tasks as training objectives. Besides, we test an existing model Late-Fusion which is enhanced by syntactic dependency information and build ensemble systems combining syntax-enhanced models and non-syntax-enhanced models. Experiments are conducted on the BioCreative VII DrugProt corpus, a manually annotated corpus for the development and evaluation of RE systems. Our results reveal that syntax-enhanced models in general degrade the performance of BioBERT in the scenario of biomedical RE but improve the performance when the subject–object distance of candidate semantic relation is long. We also explore the impact of quality of dependency parses. [Our code is available at: https://github.com/Maple177/syntax-enhanced-RE/tree/drugprot (for only MTS-BioBERT); https://github.com/Maple177/drugprot-relation-extraction (for the rest of experiments)] Database URL https://github.com/Maple177/drugprot-relation-extraction |
format | Online Article Text |
id | pubmed-9408061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-94080612022-08-26 Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? Tang, Anfu Deléger, Louise Bossy, Robert Zweigenbaum, Pierre Nédellec, Claire Database (Oxford) Original Article Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation extraction (RE) from biomedical texts very challenging. In this paper, we hypothesize that enriching a pre-trained transformer model with syntactic information may help improve its performance on chemical–drug RE tasks. For this purpose, we propose three syntax-enhanced models based on the domain-specific BioBERT model: Chunking-Enhanced-BioBERT and Constituency-Tree-BioBERT in which constituency information is integrated and a Multi-Task-Learning framework Multi-Task-Syntactic (MTS)-BioBERT in which syntactic information is injected implicitly by adding syntax-related tasks as training objectives. Besides, we test an existing model Late-Fusion which is enhanced by syntactic dependency information and build ensemble systems combining syntax-enhanced models and non-syntax-enhanced models. Experiments are conducted on the BioCreative VII DrugProt corpus, a manually annotated corpus for the development and evaluation of RE systems. Our results reveal that syntax-enhanced models in general degrade the performance of BioBERT in the scenario of biomedical RE but improve the performance when the subject–object distance of candidate semantic relation is long. We also explore the impact of quality of dependency parses. [Our code is available at: https://github.com/Maple177/syntax-enhanced-RE/tree/drugprot (for only MTS-BioBERT); https://github.com/Maple177/drugprot-relation-extraction (for the rest of experiments)] Database URL https://github.com/Maple177/drugprot-relation-extraction Oxford University Press 2022-08-25 /pmc/articles/PMC9408061/ /pubmed/36006843 http://dx.doi.org/10.1093/database/baac070 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Article Tang, Anfu Deléger, Louise Bossy, Robert Zweigenbaum, Pierre Nédellec, Claire Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? |
title | Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? |
title_full | Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? |
title_fullStr | Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? |
title_full_unstemmed | Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? |
title_short | Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? |
title_sort | do syntactic trees enhance bidirectional encoder representations from transformers (bert) models for chemical–drug relation extraction? |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408061/ https://www.ncbi.nlm.nih.gov/pubmed/36006843 http://dx.doi.org/10.1093/database/baac070 |
work_keys_str_mv | AT tanganfu dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction AT delegerlouise dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction AT bossyrobert dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction AT zweigenbaumpierre dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction AT nedellecclaire dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction |