Cargando…

A privacy-preserving distributed filtering framework for NLP artifacts

BACKGROUND: Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately loca...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sadat, Md Nazmus, Aziz, Md Momin Al, Mohammed, Noman, Pakhomov, Serguei, Liu, Hongfang, Jiang, Xiaoqian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6731605/ https://www.ncbi.nlm.nih.gov/pubmed/31493797 http://dx.doi.org/10.1186/s12911-019-0867-z

_version_	1783449703544782848
author	Sadat, Md Nazmus Aziz, Md Momin Al Mohammed, Noman Pakhomov, Serguei Liu, Hongfang Jiang, Xiaoqian
author_facet	Sadat, Md Nazmus Aziz, Md Momin Al Mohammed, Noman Pakhomov, Serguei Liu, Hongfang Jiang, Xiaoqian
author_sort	Sadat, Md Nazmus
collection	PubMed
description	BACKGROUND: Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. METHODS: A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. RESULTS: As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. CONCLUSION: This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.
format	Online Article Text
id	pubmed-6731605
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-67316052019-09-12 A privacy-preserving distributed filtering framework for NLP artifacts Sadat, Md Nazmus Aziz, Md Momin Al Mohammed, Noman Pakhomov, Serguei Liu, Hongfang Jiang, Xiaoqian BMC Med Inform Decis Mak Software BACKGROUND: Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. METHODS: A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. RESULTS: As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. CONCLUSION: This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol. BioMed Central 2019-09-07 /pmc/articles/PMC6731605/ /pubmed/31493797 http://dx.doi.org/10.1186/s12911-019-0867-z Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Sadat, Md Nazmus Aziz, Md Momin Al Mohammed, Noman Pakhomov, Serguei Liu, Hongfang Jiang, Xiaoqian A privacy-preserving distributed filtering framework for NLP artifacts
title	A privacy-preserving distributed filtering framework for NLP artifacts
title_full	A privacy-preserving distributed filtering framework for NLP artifacts
title_fullStr	A privacy-preserving distributed filtering framework for NLP artifacts
title_full_unstemmed	A privacy-preserving distributed filtering framework for NLP artifacts
title_short	A privacy-preserving distributed filtering framework for NLP artifacts
title_sort	privacy-preserving distributed filtering framework for nlp artifacts
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6731605/ https://www.ncbi.nlm.nih.gov/pubmed/31493797 http://dx.doi.org/10.1186/s12911-019-0867-z
work_keys_str_mv	AT sadatmdnazmus aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT azizmdmominal aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT mohammednoman aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT pakhomovserguei aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT liuhongfang aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT jiangxiaoqian aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT sadatmdnazmus privacypreservingdistributedfilteringframeworkfornlpartifacts AT azizmdmominal privacypreservingdistributedfilteringframeworkfornlpartifacts AT mohammednoman privacypreservingdistributedfilteringframeworkfornlpartifacts AT pakhomovserguei privacypreservingdistributedfilteringframeworkfornlpartifacts AT liuhongfang privacypreservingdistributedfilteringframeworkfornlpartifacts AT jiangxiaoqian privacypreservingdistributedfilteringframeworkfornlpartifacts

A privacy-preserving distributed filtering framework for NLP artifacts

Ejemplares similares