Cargando…
On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts
Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to unders...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148079/ http://dx.doi.org/10.1007/978-3-030-45442-5_35 |
_version_ | 1783520526909571072 |
---|---|
author | Miftahutdinov, Zulfat Alimova, Ilseyar Tutubalina, Elena |
author_facet | Miftahutdinov, Zulfat Alimova, Ilseyar Tutubalina, Elena |
author_sort | Miftahutdinov, Zulfat |
collection | PubMed |
description | Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for bioNER of drug and disease mentions across two domains in two languages, namely clinical data and user-generated texts on drug therapy in English and Russian. We investigate the role of transfer learning (TL) strategies between four corpora to reduce the number of examples that have to be manually annotated. Evaluation results demonstrate that multi-BERT shows the best transfer capabilities in the zero-shot setting when training and test sets are either in the same language or in the same domain. TL reduces the amount of labeled data needed to achieve high performance on three out of four corpora: pretrained models reach 98–99% of the full dataset performance on both types of entities after training on 10–25% of sentences. We demonstrate that pretraining on data with one or both types of transfer can be effective. |
format | Online Article Text |
id | pubmed-7148079 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71480792020-04-13 On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts Miftahutdinov, Zulfat Alimova, Ilseyar Tutubalina, Elena Advances in Information Retrieval Article Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for bioNER of drug and disease mentions across two domains in two languages, namely clinical data and user-generated texts on drug therapy in English and Russian. We investigate the role of transfer learning (TL) strategies between four corpora to reduce the number of examples that have to be manually annotated. Evaluation results demonstrate that multi-BERT shows the best transfer capabilities in the zero-shot setting when training and test sets are either in the same language or in the same domain. TL reduces the amount of labeled data needed to achieve high performance on three out of four corpora: pretrained models reach 98–99% of the full dataset performance on both types of entities after training on 10–25% of sentences. We demonstrate that pretraining on data with one or both types of transfer can be effective. 2020-03-24 /pmc/articles/PMC7148079/ http://dx.doi.org/10.1007/978-3-030-45442-5_35 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Miftahutdinov, Zulfat Alimova, Ilseyar Tutubalina, Elena On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts |
title | On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts |
title_full | On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts |
title_fullStr | On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts |
title_full_unstemmed | On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts |
title_short | On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts |
title_sort | on biomedical named entity recognition: experiments in interlingual transfer for clinical and social media texts |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148079/ http://dx.doi.org/10.1007/978-3-030-45442-5_35 |
work_keys_str_mv | AT miftahutdinovzulfat onbiomedicalnamedentityrecognitionexperimentsininterlingualtransferforclinicalandsocialmediatexts AT alimovailseyar onbiomedicalnamedentityrecognitionexperimentsininterlingualtransferforclinicalandsocialmediatexts AT tutubalinaelena onbiomedicalnamedentityrecognitionexperimentsininterlingualtransferforclinicalandsocialmediatexts |