Cargando…

Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?

Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation e...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Anfu, Deléger, Louise, Bossy, Robert, Zweigenbaum, Pierre, Nédellec, Claire
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408061/
https://www.ncbi.nlm.nih.gov/pubmed/36006843
http://dx.doi.org/10.1093/database/baac070
_version_ 1784774515091308544
author Tang, Anfu
Deléger, Louise
Bossy, Robert
Zweigenbaum, Pierre
Nédellec, Claire
author_facet Tang, Anfu
Deléger, Louise
Bossy, Robert
Zweigenbaum, Pierre
Nédellec, Claire
author_sort Tang, Anfu
collection PubMed
description Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation extraction (RE) from biomedical texts very challenging. In this paper, we hypothesize that enriching a pre-trained transformer model with syntactic information may help improve its performance on chemical–drug RE tasks. For this purpose, we propose three syntax-enhanced models based on the domain-specific BioBERT model: Chunking-Enhanced-BioBERT and Constituency-Tree-BioBERT in which constituency information is integrated and a Multi-Task-Learning framework Multi-Task-Syntactic (MTS)-BioBERT in which syntactic information is injected implicitly by adding syntax-related tasks as training objectives. Besides, we test an existing model Late-Fusion which is enhanced by syntactic dependency information and build ensemble systems combining syntax-enhanced models and non-syntax-enhanced models. Experiments are conducted on the BioCreative VII DrugProt corpus, a manually annotated corpus for the development and evaluation of RE systems. Our results reveal that syntax-enhanced models in general degrade the performance of BioBERT in the scenario of biomedical RE but improve the performance when the subject–object distance of candidate semantic relation is long. We also explore the impact of quality of dependency parses. [Our code is available at: https://github.com/Maple177/syntax-enhanced-RE/tree/drugprot (for only MTS-BioBERT); https://github.com/Maple177/drugprot-relation-extraction (for the rest of experiments)] Database URL https://github.com/Maple177/drugprot-relation-extraction
format Online
Article
Text
id pubmed-9408061
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94080612022-08-26 Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction? Tang, Anfu Deléger, Louise Bossy, Robert Zweigenbaum, Pierre Nédellec, Claire Database (Oxford) Original Article Collecting relations between chemicals and drugs is crucial in biomedical research. The pre-trained transformer model, e.g. Bidirectional Encoder Representations from Transformers (BERT), is shown to have limitations on biomedical texts; more specifically, the lack of annotated data makes relation extraction (RE) from biomedical texts very challenging. In this paper, we hypothesize that enriching a pre-trained transformer model with syntactic information may help improve its performance on chemical–drug RE tasks. For this purpose, we propose three syntax-enhanced models based on the domain-specific BioBERT model: Chunking-Enhanced-BioBERT and Constituency-Tree-BioBERT in which constituency information is integrated and a Multi-Task-Learning framework Multi-Task-Syntactic (MTS)-BioBERT in which syntactic information is injected implicitly by adding syntax-related tasks as training objectives. Besides, we test an existing model Late-Fusion which is enhanced by syntactic dependency information and build ensemble systems combining syntax-enhanced models and non-syntax-enhanced models. Experiments are conducted on the BioCreative VII DrugProt corpus, a manually annotated corpus for the development and evaluation of RE systems. Our results reveal that syntax-enhanced models in general degrade the performance of BioBERT in the scenario of biomedical RE but improve the performance when the subject–object distance of candidate semantic relation is long. We also explore the impact of quality of dependency parses. [Our code is available at: https://github.com/Maple177/syntax-enhanced-RE/tree/drugprot (for only MTS-BioBERT); https://github.com/Maple177/drugprot-relation-extraction (for the rest of experiments)] Database URL https://github.com/Maple177/drugprot-relation-extraction Oxford University Press 2022-08-25 /pmc/articles/PMC9408061/ /pubmed/36006843 http://dx.doi.org/10.1093/database/baac070 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Article
Tang, Anfu
Deléger, Louise
Bossy, Robert
Zweigenbaum, Pierre
Nédellec, Claire
Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
title Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
title_full Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
title_fullStr Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
title_full_unstemmed Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
title_short Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical–drug relation extraction?
title_sort do syntactic trees enhance bidirectional encoder representations from transformers (bert) models for chemical–drug relation extraction?
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9408061/
https://www.ncbi.nlm.nih.gov/pubmed/36006843
http://dx.doi.org/10.1093/database/baac070
work_keys_str_mv AT tanganfu dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction
AT delegerlouise dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction
AT bossyrobert dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction
AT zweigenbaumpierre dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction
AT nedellecclaire dosyntactictreesenhancebidirectionalencoderrepresentationsfromtransformersbertmodelsforchemicaldrugrelationextraction