Cargando…

Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis

In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impos...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Xiaokun, Zhu, Rongbo, Li, Yanhong, Anjum, Ashiq
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7103364/
https://www.ncbi.nlm.nih.gov/pubmed/32287562
http://dx.doi.org/10.1016/j.future.2019.01.016
_version_ 1783512042872766464
author Du, Xiaokun
Zhu, Rongbo
Li, Yanhong
Anjum, Ashiq
author_facet Du, Xiaokun
Zhu, Rongbo
Li, Yanhong
Anjum, Ashiq
author_sort Du, Xiaokun
collection PubMed
description In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method.
format Online
Article
Text
id pubmed-7103364
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-71033642020-03-31 Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis Du, Xiaokun Zhu, Rongbo Li, Yanhong Anjum, Ashiq Future Gener Comput Syst Article In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method. Elsevier B.V. 2019-09 2019-03-28 /pmc/articles/PMC7103364/ /pubmed/32287562 http://dx.doi.org/10.1016/j.future.2019.01.016 Text en © 2019 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Du, Xiaokun
Zhu, Rongbo
Li, Yanhong
Anjum, Ashiq
Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
title Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
title_full Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
title_fullStr Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
title_full_unstemmed Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
title_short Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
title_sort language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7103364/
https://www.ncbi.nlm.nih.gov/pubmed/32287562
http://dx.doi.org/10.1016/j.future.2019.01.016
work_keys_str_mv AT duxiaokun languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis
AT zhurongbo languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis
AT liyanhong languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis
AT anjumashiq languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis