Cargando…

Record linkage of banks and municipalities through multiple criteria and neural networks

Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, ad...

Descripción completa

Detalles Bibliográficos
Autores principales: Maratea, Antonio, Ciaramella, Angelo, Cianci, Giuseppe Pio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924437/
https://www.ncbi.nlm.nih.gov/pubmed/33816910
http://dx.doi.org/10.7717/peerj-cs.258
_version_ 1783659089392304128
author Maratea, Antonio
Ciaramella, Angelo
Cianci, Giuseppe Pio
author_facet Maratea, Antonio
Ciaramella, Angelo
Cianci, Giuseppe Pio
author_sort Maratea, Antonio
collection PubMed
description Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors.
format Online
Article
Text
id pubmed-7924437
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244372021-04-02 Record linkage of banks and municipalities through multiple criteria and neural networks Maratea, Antonio Ciaramella, Angelo Cianci, Giuseppe Pio PeerJ Comput Sci Artificial Intelligence Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors. PeerJ Inc. 2020-02-24 /pmc/articles/PMC7924437/ /pubmed/33816910 http://dx.doi.org/10.7717/peerj-cs.258 Text en ©2020 Maratea et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Maratea, Antonio
Ciaramella, Angelo
Cianci, Giuseppe Pio
Record linkage of banks and municipalities through multiple criteria and neural networks
title Record linkage of banks and municipalities through multiple criteria and neural networks
title_full Record linkage of banks and municipalities through multiple criteria and neural networks
title_fullStr Record linkage of banks and municipalities through multiple criteria and neural networks
title_full_unstemmed Record linkage of banks and municipalities through multiple criteria and neural networks
title_short Record linkage of banks and municipalities through multiple criteria and neural networks
title_sort record linkage of banks and municipalities through multiple criteria and neural networks
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924437/
https://www.ncbi.nlm.nih.gov/pubmed/33816910
http://dx.doi.org/10.7717/peerj-cs.258
work_keys_str_mv AT marateaantonio recordlinkageofbanksandmunicipalitiesthroughmultiplecriteriaandneuralnetworks
AT ciaramellaangelo recordlinkageofbanksandmunicipalitiesthroughmultiplecriteriaandneuralnetworks
AT ciancigiuseppepio recordlinkageofbanksandmunicipalitiesthroughmultiplecriteriaandneuralnetworks