Cargando…

Using rule-based natural language processing to improve disease normalization in biomedical text

BACKGROUND AND OBJECTIVE: In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical fie...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Ning, Singh, Bharat, Afzal, Zubair, van Mulligen, Erik M, Kors, Jan A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756254/
https://www.ncbi.nlm.nih.gov/pubmed/23043124
http://dx.doi.org/10.1136/amiajnl-2012-001173
_version_ 1782282065116397568
author Kang, Ning
Singh, Bharat
Afzal, Zubair
van Mulligen, Erik M
Kors, Jan A
author_facet Kang, Ning
Singh, Bharat
Afzal, Zubair
van Mulligen, Erik M
Kors, Jan A
author_sort Kang, Ning
collection PubMed
description BACKGROUND AND OBJECTIVE: In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. METHODS: We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. RESULTS: Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. CONCLUSIONS: We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.
format Online
Article
Text
id pubmed-3756254
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-37562542013-12-11 Using rule-based natural language processing to improve disease normalization in biomedical text Kang, Ning Singh, Bharat Afzal, Zubair van Mulligen, Erik M Kors, Jan A J Am Med Inform Assoc Research and Applications BACKGROUND AND OBJECTIVE: In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. METHODS: We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. RESULTS: Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. CONCLUSIONS: We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated. BMJ Publishing Group 2013-09 2012-10-06 /pmc/articles/PMC3756254/ /pubmed/23043124 http://dx.doi.org/10.1136/amiajnl-2012-001173 Text en Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Research and Applications
Kang, Ning
Singh, Bharat
Afzal, Zubair
van Mulligen, Erik M
Kors, Jan A
Using rule-based natural language processing to improve disease normalization in biomedical text
title Using rule-based natural language processing to improve disease normalization in biomedical text
title_full Using rule-based natural language processing to improve disease normalization in biomedical text
title_fullStr Using rule-based natural language processing to improve disease normalization in biomedical text
title_full_unstemmed Using rule-based natural language processing to improve disease normalization in biomedical text
title_short Using rule-based natural language processing to improve disease normalization in biomedical text
title_sort using rule-based natural language processing to improve disease normalization in biomedical text
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756254/
https://www.ncbi.nlm.nih.gov/pubmed/23043124
http://dx.doi.org/10.1136/amiajnl-2012-001173
work_keys_str_mv AT kangning usingrulebasednaturallanguageprocessingtoimprovediseasenormalizationinbiomedicaltext
AT singhbharat usingrulebasednaturallanguageprocessingtoimprovediseasenormalizationinbiomedicaltext
AT afzalzubair usingrulebasednaturallanguageprocessingtoimprovediseasenormalizationinbiomedicaltext
AT vanmulligenerikm usingrulebasednaturallanguageprocessingtoimprovediseasenormalizationinbiomedicaltext
AT korsjana usingrulebasednaturallanguageprocessingtoimprovediseasenormalizationinbiomedicaltext