Cargando…

An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage

With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Siqi, Qian, Yining, Zhang, Ruoshi, Wang, Yanqi, Li, Xinran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394278/
https://www.ncbi.nlm.nih.gov/pubmed/34441231
http://dx.doi.org/10.3390/e23081091
_version_ 1783743911128203264
author Sun, Siqi
Qian, Yining
Zhang, Ruoshi
Wang, Yanqi
Li, Xinran
author_facet Sun, Siqi
Qian, Yining
Zhang, Ruoshi
Wang, Yanqi
Li, Xinran
author_sort Sun, Siqi
collection PubMed
description With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, the techniques of PPRL have mostly been studied based on the alphabetic language, which is much different from the Chinese language environment. In this paper, Chinese characters (identification fields in record pairs) are encoded into strings composed of letters and numbers by using the SoundShape code according to their shapes and pronunciations. Then, the SoundShape codes are encrypted by Bloom filter, and the similarity of encrypted fields is calculated by Dice similarity. In this method, the false positive rate of Bloom filter and different proportions of sound code and shape code are considered. Finally, we performed the above methods on the synthetic datasets, and compared the precision, recall, F1-score and computational time with different values of false positive rate and proportion. The results showed that our method for PPRL in Chinese language environment improved the quality of the classification results and outperformed others with a relatively low additional cost of computation.
format Online
Article
Text
id pubmed-8394278
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83942782021-08-28 An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage Sun, Siqi Qian, Yining Zhang, Ruoshi Wang, Yanqi Li, Xinran Entropy (Basel) Article With the development of information technology, it has become a popular topic to share data from multiple sources without privacy disclosure problems. Privacy-preserving record linkage (PPRL) can link the data that truly matches and does not disclose personal information. In the existing studies, the techniques of PPRL have mostly been studied based on the alphabetic language, which is much different from the Chinese language environment. In this paper, Chinese characters (identification fields in record pairs) are encoded into strings composed of letters and numbers by using the SoundShape code according to their shapes and pronunciations. Then, the SoundShape codes are encrypted by Bloom filter, and the similarity of encrypted fields is calculated by Dice similarity. In this method, the false positive rate of Bloom filter and different proportions of sound code and shape code are considered. Finally, we performed the above methods on the synthetic datasets, and compared the precision, recall, F1-score and computational time with different values of false positive rate and proportion. The results showed that our method for PPRL in Chinese language environment improved the quality of the classification results and outperformed others with a relatively low additional cost of computation. MDPI 2021-08-22 /pmc/articles/PMC8394278/ /pubmed/34441231 http://dx.doi.org/10.3390/e23081091 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sun, Siqi
Qian, Yining
Zhang, Ruoshi
Wang, Yanqi
Li, Xinran
An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage
title An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage
title_full An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage
title_fullStr An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage
title_full_unstemmed An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage
title_short An Improved Chinese String Comparator for Bloom Filter Based Privacy-Preserving Record Linkage
title_sort improved chinese string comparator for bloom filter based privacy-preserving record linkage
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394278/
https://www.ncbi.nlm.nih.gov/pubmed/34441231
http://dx.doi.org/10.3390/e23081091
work_keys_str_mv AT sunsiqi animprovedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT qianyining animprovedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT zhangruoshi animprovedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT wangyanqi animprovedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT lixinran animprovedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT sunsiqi improvedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT qianyining improvedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT zhangruoshi improvedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT wangyanqi improvedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage
AT lixinran improvedchinesestringcomparatorforbloomfilterbasedprivacypreservingrecordlinkage