Cargando…
Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impos...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier B.V.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7103364/ https://www.ncbi.nlm.nih.gov/pubmed/32287562 http://dx.doi.org/10.1016/j.future.2019.01.016 |
_version_ | 1783512042872766464 |
---|---|
author | Du, Xiaokun Zhu, Rongbo Li, Yanhong Anjum, Ashiq |
author_facet | Du, Xiaokun Zhu, Rongbo Li, Yanhong Anjum, Ashiq |
author_sort | Du, Xiaokun |
collection | PubMed |
description | In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method. |
format | Online Article Text |
id | pubmed-7103364 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Elsevier B.V. |
record_format | MEDLINE/PubMed |
spelling | pubmed-71033642020-03-31 Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis Du, Xiaokun Zhu, Rongbo Li, Yanhong Anjum, Ashiq Future Gener Comput Syst Article In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method. Elsevier B.V. 2019-09 2019-03-28 /pmc/articles/PMC7103364/ /pubmed/32287562 http://dx.doi.org/10.1016/j.future.2019.01.016 Text en © 2019 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Du, Xiaokun Zhu, Rongbo Li, Yanhong Anjum, Ashiq Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
title | Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
title_full | Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
title_fullStr | Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
title_full_unstemmed | Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
title_short | Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
title_sort | language model-based automatic prefix abbreviation expansion method for biomedical big data analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7103364/ https://www.ncbi.nlm.nih.gov/pubmed/32287562 http://dx.doi.org/10.1016/j.future.2019.01.016 |
work_keys_str_mv | AT duxiaokun languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis AT zhurongbo languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis AT liyanhong languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis AT anjumashiq languagemodelbasedautomaticprefixabbreviationexpansionmethodforbiomedicalbigdataanalysis |