Cargando…
Soft Bigram distance for names matching
BACKGROUND: Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive rese...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080420/ https://www.ncbi.nlm.nih.gov/pubmed/33981836 http://dx.doi.org/10.7717/peerj-cs.465 |
_version_ | 1783685421987790848 |
---|---|
author | Hadwan, Mohammed Al-Hagery, Mohammed A. Al-Sanabani, Maher Al-Hagree, Salah |
author_facet | Hadwan, Mohammed Al-Hagery, Mohammed A. Al-Sanabani, Maher Al-Hagree, Salah |
author_sort | Hadwan, Mohammed |
collection | PubMed |
description | BACKGROUND: Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. METHODS: In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. RESULTS: The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets. |
format | Online Article Text |
id | pubmed-8080420 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80804202021-05-11 Soft Bigram distance for names matching Hadwan, Mohammed Al-Hagery, Mohammed A. Al-Sanabani, Maher Al-Hagree, Salah PeerJ Comput Sci Artificial Intelligence BACKGROUND: Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. METHODS: In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. RESULTS: The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets. PeerJ Inc. 2021-04-21 /pmc/articles/PMC8080420/ /pubmed/33981836 http://dx.doi.org/10.7717/peerj-cs.465 Text en ©2021 Hadwan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Artificial Intelligence Hadwan, Mohammed Al-Hagery, Mohammed A. Al-Sanabani, Maher Al-Hagree, Salah Soft Bigram distance for names matching |
title | Soft Bigram distance for names matching |
title_full | Soft Bigram distance for names matching |
title_fullStr | Soft Bigram distance for names matching |
title_full_unstemmed | Soft Bigram distance for names matching |
title_short | Soft Bigram distance for names matching |
title_sort | soft bigram distance for names matching |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080420/ https://www.ncbi.nlm.nih.gov/pubmed/33981836 http://dx.doi.org/10.7717/peerj-cs.465 |
work_keys_str_mv | AT hadwanmohammed softbigramdistancefornamesmatching AT alhagerymohammeda softbigramdistancefornamesmatching AT alsanabanimaher softbigramdistancefornamesmatching AT alhagreesalah softbigramdistancefornamesmatching |