Cargando…

Dependency parsing of biomedical text with BERT

BACKGROUND: : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on U...

Descripción completa

Detalles Bibliográficos
Autores principales: Kanerva, Jenna, Ginter, Filip, Pyysalo, Sampo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771067/
https://www.ncbi.nlm.nih.gov/pubmed/33372589
http://dx.doi.org/10.1186/s12859-020-03905-8
_version_ 1783629640725692416
author Kanerva, Jenna
Ginter, Filip
Pyysalo, Sampo
author_facet Kanerva, Jenna
Ginter, Filip
Pyysalo, Sampo
author_sort Kanerva, Jenna
collection PubMed
description BACKGROUND: : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine. METHODS: : We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing. RESULTS: : We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.
format Online
Article
Text
id pubmed-7771067
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77710672020-12-30 Dependency parsing of biomedical text with BERT Kanerva, Jenna Ginter, Filip Pyysalo, Sampo BMC Bioinformatics Research BACKGROUND: : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine. METHODS: : We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing. RESULTS: : We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods. BioMed Central 2020-12-29 /pmc/articles/PMC7771067/ /pubmed/33372589 http://dx.doi.org/10.1186/s12859-020-03905-8 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kanerva, Jenna
Ginter, Filip
Pyysalo, Sampo
Dependency parsing of biomedical text with BERT
title Dependency parsing of biomedical text with BERT
title_full Dependency parsing of biomedical text with BERT
title_fullStr Dependency parsing of biomedical text with BERT
title_full_unstemmed Dependency parsing of biomedical text with BERT
title_short Dependency parsing of biomedical text with BERT
title_sort dependency parsing of biomedical text with bert
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771067/
https://www.ncbi.nlm.nih.gov/pubmed/33372589
http://dx.doi.org/10.1186/s12859-020-03905-8
work_keys_str_mv AT kanervajenna dependencyparsingofbiomedicaltextwithbert
AT ginterfilip dependencyparsingofbiomedicaltextwithbert
AT pyysalosampo dependencyparsingofbiomedicaltextwithbert