Cargando…

Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values

With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a v...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Junhak, Jeong, Jinwoo, Jung, Sungji, Moon, Jihoon, Rho, Seungmin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8877642/
https://www.ncbi.nlm.nih.gov/pubmed/35207676
http://dx.doi.org/10.3390/jpm12020190
_version_ 1784658468561485824
author Lee, Junhak
Jeong, Jinwoo
Jung, Sungji
Moon, Jihoon
Rho, Seungmin
author_facet Lee, Junhak
Jeong, Jinwoo
Jung, Sungji
Moon, Jihoon
Rho, Seungmin
author_sort Lee, Junhak
collection PubMed
description With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a verification of de-identification techniques for personal healthcare information by considering data confidentiality and usability. Data are generated and preprocessed by considering the actual statistical data, personal information datasets, and de-identification datasets based on medical data to represent the de-identification technique as a numeric dataset. Five tree-based regression models (i.e., decision tree, random forest, gradient boosting machine, extreme gradient boosting, and light gradient boosting machine) are constructed using the de-identification dataset to effectively discover nonlinear relationships between dependent and independent variables in numerical datasets. Then, the most effective model is selected from personal information data in which pseudonym processing is essential for data utilization. The Shapley additive explanation, an explainable artificial intelligence technique, is applied to the most effective model to establish pseudonym processing policies and machine learning to present a machine-learning process that selects an appropriate de-identification methodology.
format Online
Article
Text
id pubmed-8877642
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88776422022-02-26 Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values Lee, Junhak Jeong, Jinwoo Jung, Sungji Moon, Jihoon Rho, Seungmin J Pers Med Article With the development of big data and cloud computing technologies, the importance of pseudonym information has grown. However, the tools for verifying whether the de-identification methodology is correctly applied to ensure data confidentiality and usability are insufficient. This paper proposes a verification of de-identification techniques for personal healthcare information by considering data confidentiality and usability. Data are generated and preprocessed by considering the actual statistical data, personal information datasets, and de-identification datasets based on medical data to represent the de-identification technique as a numeric dataset. Five tree-based regression models (i.e., decision tree, random forest, gradient boosting machine, extreme gradient boosting, and light gradient boosting machine) are constructed using the de-identification dataset to effectively discover nonlinear relationships between dependent and independent variables in numerical datasets. Then, the most effective model is selected from personal information data in which pseudonym processing is essential for data utilization. The Shapley additive explanation, an explainable artificial intelligence technique, is applied to the most effective model to establish pseudonym processing policies and machine learning to present a machine-learning process that selects an appropriate de-identification methodology. MDPI 2022-01-31 /pmc/articles/PMC8877642/ /pubmed/35207676 http://dx.doi.org/10.3390/jpm12020190 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lee, Junhak
Jeong, Jinwoo
Jung, Sungji
Moon, Jihoon
Rho, Seungmin
Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
title Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
title_full Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
title_fullStr Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
title_full_unstemmed Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
title_short Verification of De-Identification Techniques for Personal Information Using Tree-Based Methods with Shapley Values
title_sort verification of de-identification techniques for personal information using tree-based methods with shapley values
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8877642/
https://www.ncbi.nlm.nih.gov/pubmed/35207676
http://dx.doi.org/10.3390/jpm12020190
work_keys_str_mv AT leejunhak verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues
AT jeongjinwoo verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues
AT jungsungji verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues
AT moonjihoon verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues
AT rhoseungmin verificationofdeidentificationtechniquesforpersonalinformationusingtreebasedmethodswithshapleyvalues