Cargando…
Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a v...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8877642/ https://www.ncbi.nlm.nih.gov/pubmed/35207676 http://dx.doi.org/10.3390/jpm12020190 |
_version_ | 1784658468561485824 |
---|---|
author | Lee, Junhak Jeong, Jinwoo Jung, Sungji Moon, Jihoon Rho, Seungmin |
author_facet | Lee, Junhak Jeong, Jinwoo Jung, Sungji Moon, Jihoon Rho, Seungmin |
author_sort | Lee, Junhak |
collection | PubMed |
description | With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a verification of de-identification techniques for personal healthcare information by considering data confidentiality and usability. Data are generated and preprocessed by considering the actual statistical data, personal information datasets, and de-identification datasets based on medical data to represent the de-identification technique as a numeric dataset. Five tree-based regression models (i.e., decision tree, random forest, gradient boosting machine, extreme gradient boosting, and light gradient boosting machine) are constructed using the de-identification dataset to effectively discover nonlinear relationships between dependent and independent variables in numerical datasets. Then, the most effective model is selected from personal information data in which pseudonym processing is essential for data utilization. The Shapley additive explanation, an explainable artificial intelligence technique, is applied to the most effective model to establish pseudonym processing policies and machine learning to present a machine-learning process that selects an appropriate de-identification methodology. |
format | Online Article Text |
id | pubmed-8877642 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-88776422022-02-26 Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values Lee, Junhak Jeong, Jinwoo Jung, Sungji Moon, Jihoon Rho, Seungmin J Pers Med Article With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a verification of de-identification techniques for personal healthcare information by considering data confidentiality and usability. Data are generated and preprocessed by considering the actual statistical data, personal information datasets, and de-identification datasets based on medical data to represent the de-identification technique as a numeric dataset. Five tree-based regression models (i.e., decision tree, random forest, gradient boosting machine, extreme gradient boosting, and light gradient boosting machine) are constructed using the de-identification dataset to effectively discover nonlinear relationships between dependent and independent variables in numerical datasets. Then, the most effective model is selected from personal information data in which pseudonym processing is essential for data utilization. The Shapley additive explanation, an explainable artificial intelligence technique, is applied to the most effective model to establish pseudonym processing policies and machine learning to present a machine-learning process that selects an appropriate de-identification methodology. MDPI 2022-01-31 /pmc/articles/PMC8877642/ /pubmed/35207676 http://dx.doi.org/10.3390/jpm12020190 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Lee, Junhak Jeong, Jinwoo Jung, Sungji Moon, Jihoon Rho, Seungmin Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values |
title | Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values |
title_full | Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values |
title_fullStr | Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values |
title_full_unstemmed | Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values |
title_short | Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values |
title_sort | verification of de-identification techniques for personal information using tree-based methods with shapley values |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8877642/ https://www.ncbi.nlm.nih.gov/pubmed/35207676 http://dx.doi.org/10.3390/jpm12020190 |
work_keys_str_mv | AT leejunhak verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues AT jeongjinwoo verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues AT jungsungji verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues AT moonjihoon verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues AT rhoseungmin verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues |