Cargando…

On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts

Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to unders...

Descripción completa

Detalles Bibliográficos
Autores principales: Miftahutdinov, Zulfat, Alimova, Ilseyar, Tutubalina, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148079/
http://dx.doi.org/10.1007/978-3-030-45442-5_35
_version_ 1783520526909571072
author Miftahutdinov, Zulfat
Alimova, Ilseyar
Tutubalina, Elena
author_facet Miftahutdinov, Zulfat
Alimova, Ilseyar
Tutubalina, Elena
author_sort Miftahutdinov, Zulfat
collection PubMed
description Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for bioNER of drug and disease mentions across two domains in two languages, namely clinical data and user-generated texts on drug therapy in English and Russian. We investigate the role of transfer learning (TL) strategies between four corpora to reduce the number of examples that have to be manually annotated. Evaluation results demonstrate that multi-BERT shows the best transfer capabilities in the zero-shot setting when training and test sets are either in the same language or in the same domain. TL reduces the amount of labeled data needed to achieve high performance on three out of four corpora: pretrained models reach 98–99% of the full dataset performance on both types of entities after training on 10–25% of sentences. We demonstrate that pretraining on data with one or both types of transfer can be effective.
format Online
Article
Text
id pubmed-7148079
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71480792020-04-13 On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts Miftahutdinov, Zulfat Alimova, Ilseyar Tutubalina, Elena Advances in Information Retrieval Article Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for bioNER of drug and disease mentions across two domains in two languages, namely clinical data and user-generated texts on drug therapy in English and Russian. We investigate the role of transfer learning (TL) strategies between four corpora to reduce the number of examples that have to be manually annotated. Evaluation results demonstrate that multi-BERT shows the best transfer capabilities in the zero-shot setting when training and test sets are either in the same language or in the same domain. TL reduces the amount of labeled data needed to achieve high performance on three out of four corpora: pretrained models reach 98–99% of the full dataset performance on both types of entities after training on 10–25% of sentences. We demonstrate that pretraining on data with one or both types of transfer can be effective. 2020-03-24 /pmc/articles/PMC7148079/ http://dx.doi.org/10.1007/978-3-030-45442-5_35 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Miftahutdinov, Zulfat
Alimova, Ilseyar
Tutubalina, Elena
On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
title On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
title_full On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
title_fullStr On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
title_full_unstemmed On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
title_short On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
title_sort on biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148079/
http://dx.doi.org/10.1007/978-3-030-45442-5_35
work_keys_str_mv AT miftahutdinovzulfat onbiomedicalnamedentityrecognitionexperimentsininterlingualtransferforclinicalandsocialmediatexts
AT alimovailseyar onbiomedicalnamedentityrecognitionexperimentsininterlingualtransferforclinicalandsocialmediatexts
AT tutubalinaelena onbiomedicalnamedentityrecognitionexperimentsininterlingualtransferforclinicalandsocialmediatexts