Cargando…

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization

OBJECTIVE: Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Dongfang, Gopale, Manoj, Zhang, Jiacheng, Brown, Kris, Begoli, Edmon, Bethard, Steven
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566510/ https://www.ncbi.nlm.nih.gov/pubmed/32719838 http://dx.doi.org/10.1093/jamia/ocaa080

_version_	1783596146402263040
author	Xu, Dongfang Gopale, Manoj Zhang, Jiacheng Brown, Kris Begoli, Edmon Bethard, Steven
author_facet	Xu, Dongfang Gopale, Manoj Zhang, Jiacheng Brown, Kris Begoli, Edmon Bethard, Steven
author_sort	Xu, Dongfang
collection	PubMed
description	OBJECTIVE: Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. MATERIALS AND METHODS: The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. RESULTS: Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model’s accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. DISCUSSION: Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. CONCLUSIONS: Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network–based ranking model to accurately link phrases in text to UMLS concepts.
format	Online Article Text
id	pubmed-7566510
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-75665102020-10-20 Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization Xu, Dongfang Gopale, Manoj Zhang, Jiacheng Brown, Kris Begoli, Edmon Bethard, Steven J Am Med Inform Assoc Research and Applications OBJECTIVE: Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. MATERIALS AND METHODS: The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. RESULTS: Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model’s accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. DISCUSSION: Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. CONCLUSIONS: Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network–based ranking model to accurately link phrases in text to UMLS concepts. Oxford University Press 2020-07-27 /pmc/articles/PMC7566510/ /pubmed/32719838 http://dx.doi.org/10.1093/jamia/ocaa080 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Xu, Dongfang Gopale, Manoj Zhang, Jiacheng Brown, Kris Begoli, Edmon Bethard, Steven Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization
title	Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization
title_full	Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization
title_fullStr	Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization
title_full_unstemmed	Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization
title_short	Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization
title_sort	unified medical language system resources improve sieve-based generation and bidirectional encoder representations from transformers (bert)–based ranking for concept normalization
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566510/ https://www.ncbi.nlm.nih.gov/pubmed/32719838 http://dx.doi.org/10.1093/jamia/ocaa080
work_keys_str_mv	AT xudongfang unifiedmedicallanguagesystemresourcesimprovesievebasedgenerationandbidirectionalencoderrepresentationsfromtransformersbertbasedrankingforconceptnormalization AT gopalemanoj unifiedmedicallanguagesystemresourcesimprovesievebasedgenerationandbidirectionalencoderrepresentationsfromtransformersbertbasedrankingforconceptnormalization AT zhangjiacheng unifiedmedicallanguagesystemresourcesimprovesievebasedgenerationandbidirectionalencoderrepresentationsfromtransformersbertbasedrankingforconceptnormalization AT brownkris unifiedmedicallanguagesystemresourcesimprovesievebasedgenerationandbidirectionalencoderrepresentationsfromtransformersbertbasedrankingforconceptnormalization AT begoliedmon unifiedmedicallanguagesystemresourcesimprovesievebasedgenerationandbidirectionalencoderrepresentationsfromtransformersbertbasedrankingforconceptnormalization AT bethardsteven unifiedmedicallanguagesystemresourcesimprovesievebasedgenerationandbidirectionalencoderrepresentationsfromtransformersbertbasedrankingforconceptnormalization

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization

Ejemplares similares