Cargando…

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study

BACKGROUND: The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important tas...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Fei, Jin, Yonghao, Liu, Weisong, Rawat, Bhanu Pratap Singh, Cai, Pengshan, Yu, Hong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2019
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746103/ https://www.ncbi.nlm.nih.gov/pubmed/31516126 http://dx.doi.org/10.2196/14830

_version_	1783451656236564480
author	Li, Fei Jin, Yonghao Liu, Weisong Rawat, Bhanu Pratap Singh Cai, Pengshan Yu, Hong
author_facet	Li, Fei Jin, Yonghao Liu, Weisong Rawat, Bhanu Pratap Singh Cai, Pengshan Yu, Hong
author_sort	Li, Fei
collection	PubMed
description	BACKGROUND: The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization. OBJECTIVE: We aim to investigate the effectiveness of BERT-based models for biomedical or clinical entity normalization. In addition, our second objective is to investigate whether the domains of training data influence the performances of BERT-based models as well as the degree of influence. METHODS: Our data was comprised of 1.5 million unlabeled electronic health record (EHR) notes. We first fine-tuned BioBERT on this large collection of unlabeled EHR notes. This generated our BERT-based model trained using 1.5 million electronic health record notes (EhrBERT). We then further fine-tuned EhrBERT, BioBERT, and BERT on three annotated corpora for biomedical and clinical entity normalization: the Medication, Indication, and Adverse Drug Events (MADE) 1.0 corpus, the National Center for Biotechnology Information (NCBI) disease corpus, and the Chemical-Disease Relations (CDR) corpus. We compared our models with two state-of-the-art normalization systems, namely MetaMap and disease name normalization (DNorm). RESULTS: EhrBERT achieved 40.95% F1 in the MADE 1.0 corpus for mapping named entities to the Medical Dictionary for Regulatory Activities and the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT), which have about 380,000 terms. In this corpus, EhrBERT outperformed MetaMap by 2.36% in F1. For the NCBI disease corpus and CDR corpus, EhrBERT also outperformed DNorm by improving the F1 scores from 88.37% and 89.92% to 90.35% and 93.82%, respectively. Compared with BioBERT and BERT, EhrBERT outperformed them on the MADE 1.0 corpus and the CDR corpus. CONCLUSIONS: Our work shows that BERT-based models have achieved state-of-the-art performance for biomedical and clinical entity normalization. BERT-based models can be readily fine-tuned to normalize any kind of named entities.
format	Online Article Text
id	pubmed-6746103
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-67461032019-09-23 Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study Li, Fei Jin, Yonghao Liu, Weisong Rawat, Bhanu Pratap Singh Cai, Pengshan Yu, Hong JMIR Med Inform Original Paper BACKGROUND: The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization. OBJECTIVE: We aim to investigate the effectiveness of BERT-based models for biomedical or clinical entity normalization. In addition, our second objective is to investigate whether the domains of training data influence the performances of BERT-based models as well as the degree of influence. METHODS: Our data was comprised of 1.5 million unlabeled electronic health record (EHR) notes. We first fine-tuned BioBERT on this large collection of unlabeled EHR notes. This generated our BERT-based model trained using 1.5 million electronic health record notes (EhrBERT). We then further fine-tuned EhrBERT, BioBERT, and BERT on three annotated corpora for biomedical and clinical entity normalization: the Medication, Indication, and Adverse Drug Events (MADE) 1.0 corpus, the National Center for Biotechnology Information (NCBI) disease corpus, and the Chemical-Disease Relations (CDR) corpus. We compared our models with two state-of-the-art normalization systems, namely MetaMap and disease name normalization (DNorm). RESULTS: EhrBERT achieved 40.95% F1 in the MADE 1.0 corpus for mapping named entities to the Medical Dictionary for Regulatory Activities and the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT), which have about 380,000 terms. In this corpus, EhrBERT outperformed MetaMap by 2.36% in F1. For the NCBI disease corpus and CDR corpus, EhrBERT also outperformed DNorm by improving the F1 scores from 88.37% and 89.92% to 90.35% and 93.82%, respectively. Compared with BioBERT and BERT, EhrBERT outperformed them on the MADE 1.0 corpus and the CDR corpus. CONCLUSIONS: Our work shows that BERT-based models have achieved state-of-the-art performance for biomedical and clinical entity normalization. BERT-based models can be readily fine-tuned to normalize any kind of named entities. JMIR Publications 2019-09-12 /pmc/articles/PMC6746103/ /pubmed/31516126 http://dx.doi.org/10.2196/14830 Text en ©Fei Li, Yonghao Jin, Weisong Liu, Bhanu Pratap Singh Rawat, Pengshan Cai, Hong Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 12.09.2019. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Li, Fei Jin, Yonghao Liu, Weisong Rawat, Bhanu Pratap Singh Cai, Pengshan Yu, Hong Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
title	Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
title_full	Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
title_fullStr	Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
title_full_unstemmed	Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
title_short	Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
title_sort	fine-tuning bidirectional encoder representations from transformers (bert)–based models on large-scale electronic health record notes: an empirical study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746103/ https://www.ncbi.nlm.nih.gov/pubmed/31516126 http://dx.doi.org/10.2196/14830
work_keys_str_mv	AT lifei finetuningbidirectionalencoderrepresentationsfromtransformersbertbasedmodelsonlargescaleelectronichealthrecordnotesanempiricalstudy AT jinyonghao finetuningbidirectionalencoderrepresentationsfromtransformersbertbasedmodelsonlargescaleelectronichealthrecordnotesanempiricalstudy AT liuweisong finetuningbidirectionalencoderrepresentationsfromtransformersbertbasedmodelsonlargescaleelectronichealthrecordnotesanempiricalstudy AT rawatbhanupratapsingh finetuningbidirectionalencoderrepresentationsfromtransformersbertbasedmodelsonlargescaleelectronichealthrecordnotesanempiricalstudy AT caipengshan finetuningbidirectionalencoderrepresentationsfromtransformersbertbasedmodelsonlargescaleelectronichealthrecordnotesanempiricalstudy AT yuhong finetuningbidirectionalencoderrepresentationsfromtransformersbertbasedmodelsonlargescaleelectronichealthrecordnotesanempiricalstudy

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study

Ejemplares similares