Cargando…

A Sequence Obfuscation Method for Protecting Personal Genomic Privacy

With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneou...

Descripción completa

Detalles Bibliográficos
Autores principales: Wan, Shibiao, Wang, Jieqiong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9043694/
https://www.ncbi.nlm.nih.gov/pubmed/35495121
http://dx.doi.org/10.3389/fgene.2022.876686
_version_ 1784694938457341952
author Wan, Shibiao
Wang, Jieqiong
author_facet Wan, Shibiao
Wang, Jieqiong
author_sort Wan, Shibiao
collection PubMed
description With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. It is highly urgent to develop methods which make the personal genomic data both utilizable and confidential. Existing genomic privacy-protection methods are either time-consuming for encryption or with low accuracy of data recovery. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of genomic sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated via a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity.
format Online
Article
Text
id pubmed-9043694
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90436942022-04-28 A Sequence Obfuscation Method for Protecting Personal Genomic Privacy Wan, Shibiao Wang, Jieqiong Front Genet Genetics With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. It is highly urgent to develop methods which make the personal genomic data both utilizable and confidential. Existing genomic privacy-protection methods are either time-consuming for encryption or with low accuracy of data recovery. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of genomic sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated via a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity. Frontiers Media S.A. 2022-04-13 /pmc/articles/PMC9043694/ /pubmed/35495121 http://dx.doi.org/10.3389/fgene.2022.876686 Text en Copyright © 2022 Wan and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wan, Shibiao
Wang, Jieqiong
A Sequence Obfuscation Method for Protecting Personal Genomic Privacy
title A Sequence Obfuscation Method for Protecting Personal Genomic Privacy
title_full A Sequence Obfuscation Method for Protecting Personal Genomic Privacy
title_fullStr A Sequence Obfuscation Method for Protecting Personal Genomic Privacy
title_full_unstemmed A Sequence Obfuscation Method for Protecting Personal Genomic Privacy
title_short A Sequence Obfuscation Method for Protecting Personal Genomic Privacy
title_sort sequence obfuscation method for protecting personal genomic privacy
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9043694/
https://www.ncbi.nlm.nih.gov/pubmed/35495121
http://dx.doi.org/10.3389/fgene.2022.876686
work_keys_str_mv AT wanshibiao asequenceobfuscationmethodforprotectingpersonalgenomicprivacy
AT wangjieqiong asequenceobfuscationmethodforprotectingpersonalgenomicprivacy
AT wanshibiao sequenceobfuscationmethodforprotectingpersonalgenomicprivacy
AT wangjieqiong sequenceobfuscationmethodforprotectingpersonalgenomicprivacy