Cargando…

Soft Bigram distance for names matching

BACKGROUND: Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive rese...

Descripción completa

Detalles Bibliográficos
Autores principales: Hadwan, Mohammed, Al-Hagery, Mohammed A., Al-Sanabani, Maher, Al-Hagree, Salah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080420/
https://www.ncbi.nlm.nih.gov/pubmed/33981836
http://dx.doi.org/10.7717/peerj-cs.465
_version_ 1783685421987790848
author Hadwan, Mohammed
Al-Hagery, Mohammed A.
Al-Sanabani, Maher
Al-Hagree, Salah
author_facet Hadwan, Mohammed
Al-Hagery, Mohammed A.
Al-Sanabani, Maher
Al-Hagree, Salah
author_sort Hadwan, Mohammed
collection PubMed
description BACKGROUND: Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. METHODS: In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. RESULTS: The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets.
format Online
Article
Text
id pubmed-8080420
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-80804202021-05-11 Soft Bigram distance for names matching Hadwan, Mohammed Al-Hagery, Mohammed A. Al-Sanabani, Maher Al-Hagree, Salah PeerJ Comput Sci Artificial Intelligence BACKGROUND: Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. METHODS: In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. RESULTS: The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets. PeerJ Inc. 2021-04-21 /pmc/articles/PMC8080420/ /pubmed/33981836 http://dx.doi.org/10.7717/peerj-cs.465 Text en ©2021 Hadwan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Hadwan, Mohammed
Al-Hagery, Mohammed A.
Al-Sanabani, Maher
Al-Hagree, Salah
Soft Bigram distance for names matching
title Soft Bigram distance for names matching
title_full Soft Bigram distance for names matching
title_fullStr Soft Bigram distance for names matching
title_full_unstemmed Soft Bigram distance for names matching
title_short Soft Bigram distance for names matching
title_sort soft bigram distance for names matching
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080420/
https://www.ncbi.nlm.nih.gov/pubmed/33981836
http://dx.doi.org/10.7717/peerj-cs.465
work_keys_str_mv AT hadwanmohammed softbigramdistancefornamesmatching
AT alhagerymohammeda softbigramdistancefornamesmatching
AT alsanabanimaher softbigramdistancefornamesmatching
AT alhagreesalah softbigramdistancefornamesmatching