Cargando…

A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System

BACKGROUND: Big data useful for epidemiological research can be obtained by integrating data corresponding to individuals between databases managed by different institutions. Privacy information must be protected while performing efficient, high-level data matching. OBJECTIVE: Privacy-preserving dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Miyaji, Atsuko, Watanabe, Kaname, Takano, Yuuki, Nakasho, Kazuhisa, Nakamura, Sho, Wang, Yuntao, Narimatsu, Hiroto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9840098/
https://www.ncbi.nlm.nih.gov/pubmed/36583931
http://dx.doi.org/10.2196/38922
_version_ 1784869573841911808
author Miyaji, Atsuko
Watanabe, Kaname
Takano, Yuuki
Nakasho, Kazuhisa
Nakamura, Sho
Wang, Yuntao
Narimatsu, Hiroto
author_facet Miyaji, Atsuko
Watanabe, Kaname
Takano, Yuuki
Nakasho, Kazuhisa
Nakamura, Sho
Wang, Yuntao
Narimatsu, Hiroto
author_sort Miyaji, Atsuko
collection PubMed
description BACKGROUND: Big data useful for epidemiological research can be obtained by integrating data corresponding to individuals between databases managed by different institutions. Privacy information must be protected while performing efficient, high-level data matching. OBJECTIVE: Privacy-preserving distributed data integration (PDDI) enables data matching between multiple databases without moving privacy information; however, its actual implementation requires matching security, accuracy, and performance. Moreover, identifying the optimal data item in the absence of a unique matching key is necessary. We aimed to conduct a basic matching experiment using a model to assess the accuracy of cancer screening. METHODS: To experiment with actual data, we created a data set mimicking the cancer screening and registration data in Japan and conducted a matching experiment using a PDDI system between geographically distant institutions. Errors similar to those found empirically in data sets recorded in Japanese were artificially introduced into the data set. The matching-key error rate of the data common to both data sets was set sufficiently higher than expected in the actual database: 85.0% and 59.0% for the data simulating colorectal and breast cancers, respectively. Various combinations of name, gender, date of birth, and address were used for the matching key. To evaluate the matching accuracy, the matching sensitivity and specificity were calculated based on the number of cancer-screening data points, and the effect of matching accuracy on the sensitivity and specificity of cancer screening was estimated based on the obtained values. To evaluate the performance, we measured central processing unit use, memory use, and network traffic. RESULTS: For combinations with a specificity ≥99% and high sensitivity, the date of birth and first name were used in the data simulating colorectal cancer, and the matching sensitivity and specificity were 55.00% and 99.85%, respectively. In the data simulating breast cancer, the date of birth and family name were used, and the matching sensitivity and specificity were 88.71% and 99.98%, respectively. Assuming the sensitivity and specificity of cancer screening at 90%, the apparent values decreased to 74.90% and 89.93%, respectively. A trial calculation was performed using a combination with the same data set and 100% specificity. When the matching sensitivity was 82.26%, the apparent screening sensitivity was maintained at 90%, and the screening specificity decreased to 89.89%. For 214 data points, the execution time was 82 minutes and 26 seconds without parallelization and 11 minutes and 38 seconds with parallelization; 19.33% of the calculation time was for the data-holding institutions. Memory use was 3.4 GB for the PDDI server and 2.7 GB for the data-holding institutions. CONCLUSIONS: We demonstrated the rudimentary feasibility of introducing a PDDI system for cancer-screening accuracy assessment. We plan to conduct matching experiments based on actual data and compare them with the existing methods.
format Online
Article
Text
id pubmed-9840098
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-98400982023-01-15 A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System Miyaji, Atsuko Watanabe, Kaname Takano, Yuuki Nakasho, Kazuhisa Nakamura, Sho Wang, Yuntao Narimatsu, Hiroto JMIR Med Inform Original Paper BACKGROUND: Big data useful for epidemiological research can be obtained by integrating data corresponding to individuals between databases managed by different institutions. Privacy information must be protected while performing efficient, high-level data matching. OBJECTIVE: Privacy-preserving distributed data integration (PDDI) enables data matching between multiple databases without moving privacy information; however, its actual implementation requires matching security, accuracy, and performance. Moreover, identifying the optimal data item in the absence of a unique matching key is necessary. We aimed to conduct a basic matching experiment using a model to assess the accuracy of cancer screening. METHODS: To experiment with actual data, we created a data set mimicking the cancer screening and registration data in Japan and conducted a matching experiment using a PDDI system between geographically distant institutions. Errors similar to those found empirically in data sets recorded in Japanese were artificially introduced into the data set. The matching-key error rate of the data common to both data sets was set sufficiently higher than expected in the actual database: 85.0% and 59.0% for the data simulating colorectal and breast cancers, respectively. Various combinations of name, gender, date of birth, and address were used for the matching key. To evaluate the matching accuracy, the matching sensitivity and specificity were calculated based on the number of cancer-screening data points, and the effect of matching accuracy on the sensitivity and specificity of cancer screening was estimated based on the obtained values. To evaluate the performance, we measured central processing unit use, memory use, and network traffic. RESULTS: For combinations with a specificity ≥99% and high sensitivity, the date of birth and first name were used in the data simulating colorectal cancer, and the matching sensitivity and specificity were 55.00% and 99.85%, respectively. In the data simulating breast cancer, the date of birth and family name were used, and the matching sensitivity and specificity were 88.71% and 99.98%, respectively. Assuming the sensitivity and specificity of cancer screening at 90%, the apparent values decreased to 74.90% and 89.93%, respectively. A trial calculation was performed using a combination with the same data set and 100% specificity. When the matching sensitivity was 82.26%, the apparent screening sensitivity was maintained at 90%, and the screening specificity decreased to 89.89%. For 214 data points, the execution time was 82 minutes and 26 seconds without parallelization and 11 minutes and 38 seconds with parallelization; 19.33% of the calculation time was for the data-holding institutions. Memory use was 3.4 GB for the PDDI server and 2.7 GB for the data-holding institutions. CONCLUSIONS: We demonstrated the rudimentary feasibility of introducing a PDDI system for cancer-screening accuracy assessment. We plan to conduct matching experiments based on actual data and compare them with the existing methods. JMIR Publications 2022-12-30 /pmc/articles/PMC9840098/ /pubmed/36583931 http://dx.doi.org/10.2196/38922 Text en ©Atsuko Miyaji, Kaname Watanabe, Yuuki Takano, Kazuhisa Nakasho, Sho Nakamura, Yuntao Wang, Hiroto Narimatsu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.12.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Miyaji, Atsuko
Watanabe, Kaname
Takano, Yuuki
Nakasho, Kazuhisa
Nakamura, Sho
Wang, Yuntao
Narimatsu, Hiroto
A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System
title A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System
title_full A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System
title_fullStr A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System
title_full_unstemmed A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System
title_short A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System
title_sort privacy-preserving distributed medical data integration security system for accuracy assessment of cancer screening: development study of novel data integration system
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9840098/
https://www.ncbi.nlm.nih.gov/pubmed/36583931
http://dx.doi.org/10.2196/38922
work_keys_str_mv AT miyajiatsuko aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT watanabekaname aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT takanoyuuki aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT nakashokazuhisa aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT nakamurasho aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT wangyuntao aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT narimatsuhiroto aprivacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT miyajiatsuko privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT watanabekaname privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT takanoyuuki privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT nakashokazuhisa privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT nakamurasho privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT wangyuntao privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem
AT narimatsuhiroto privacypreservingdistributedmedicaldataintegrationsecuritysystemforaccuracyassessmentofcancerscreeningdevelopmentstudyofnoveldataintegrationsystem