Cargando…

Secure approximation of edit distance on genomic data

BACKGROUND: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads...

Descripción completa

Detalles Bibliográficos
Autores principales: Aziz, Md Momin Al, Alhadidi, Dima, Mohammed, Noman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5547448/
https://www.ncbi.nlm.nih.gov/pubmed/28786362
http://dx.doi.org/10.1186/s12920-017-0279-9
_version_ 1783255691741364224
author Aziz, Md Momin Al
Alhadidi, Dima
Mohammed, Noman
author_facet Aziz, Md Momin Al
Alhadidi, Dima
Mohammed, Noman
author_sort Aziz, Md Momin Al
collection PubMed
description BACKGROUND: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. METHODS: In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. RESULTS: Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. CONCLUSION: The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-017-0279-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5547448
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55474482017-08-09 Secure approximation of edit distance on genomic data Aziz, Md Momin Al Alhadidi, Dima Mohammed, Noman BMC Med Genomics Research BACKGROUND: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. METHODS: In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. RESULTS: Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. CONCLUSION: The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-017-0279-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-26 /pmc/articles/PMC5547448/ /pubmed/28786362 http://dx.doi.org/10.1186/s12920-017-0279-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Aziz, Md Momin Al
Alhadidi, Dima
Mohammed, Noman
Secure approximation of edit distance on genomic data
title Secure approximation of edit distance on genomic data
title_full Secure approximation of edit distance on genomic data
title_fullStr Secure approximation of edit distance on genomic data
title_full_unstemmed Secure approximation of edit distance on genomic data
title_short Secure approximation of edit distance on genomic data
title_sort secure approximation of edit distance on genomic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5547448/
https://www.ncbi.nlm.nih.gov/pubmed/28786362
http://dx.doi.org/10.1186/s12920-017-0279-9
work_keys_str_mv AT azizmdmominal secureapproximationofeditdistanceongenomicdata
AT alhadididima secureapproximationofeditdistanceongenomicdata
AT mohammednoman secureapproximationofeditdistanceongenomicdata