Cargando…
Record linkage of banks and municipalities through multiple criteria and neural networks
Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, ad...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924437/ https://www.ncbi.nlm.nih.gov/pubmed/33816910 http://dx.doi.org/10.7717/peerj-cs.258 |
_version_ | 1783659089392304128 |
---|---|
author | Maratea, Antonio Ciaramella, Angelo Cianci, Giuseppe Pio |
author_facet | Maratea, Antonio Ciaramella, Angelo Cianci, Giuseppe Pio |
author_sort | Maratea, Antonio |
collection | PubMed |
description | Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors. |
format | Online Article Text |
id | pubmed-7924437 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79244372021-04-02 Record linkage of banks and municipalities through multiple criteria and neural networks Maratea, Antonio Ciaramella, Angelo Cianci, Giuseppe Pio PeerJ Comput Sci Artificial Intelligence Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors. PeerJ Inc. 2020-02-24 /pmc/articles/PMC7924437/ /pubmed/33816910 http://dx.doi.org/10.7717/peerj-cs.258 Text en ©2020 Maratea et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Artificial Intelligence Maratea, Antonio Ciaramella, Angelo Cianci, Giuseppe Pio Record linkage of banks and municipalities through multiple criteria and neural networks |
title | Record linkage of banks and municipalities through multiple criteria and neural networks |
title_full | Record linkage of banks and municipalities through multiple criteria and neural networks |
title_fullStr | Record linkage of banks and municipalities through multiple criteria and neural networks |
title_full_unstemmed | Record linkage of banks and municipalities through multiple criteria and neural networks |
title_short | Record linkage of banks and municipalities through multiple criteria and neural networks |
title_sort | record linkage of banks and municipalities through multiple criteria and neural networks |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924437/ https://www.ncbi.nlm.nih.gov/pubmed/33816910 http://dx.doi.org/10.7717/peerj-cs.258 |
work_keys_str_mv | AT marateaantonio recordlinkageofbanksandmunicipalitiesthroughmultiplecriteriaandneuralnetworks AT ciaramellaangelo recordlinkageofbanksandmunicipalitiesthroughmultiplecriteriaandneuralnetworks AT ciancigiuseppepio recordlinkageofbanksandmunicipalitiesthroughmultiplecriteriaandneuralnetworks |