Cargando…

Semi-Parallel logistic regression for GWAS on encrypted data

BACKGROUND: The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Miran, Song, Yongsoo, Li, Baiyu, Micciancio, Daniele
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372846/
https://www.ncbi.nlm.nih.gov/pubmed/32693798
http://dx.doi.org/10.1186/s12920-020-0724-z
_version_ 1783561394041389056
author Kim, Miran
Song, Yongsoo
Li, Baiyu
Micciancio, Daniele
author_facet Kim, Miran
Song, Yongsoo
Li, Baiyu
Micciancio, Daniele
author_sort Kim, Miran
collection PubMed
description BACKGROUND: The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has become a major hurdle for data management and utilization. Homomorphic encryption is one of the most powerful cryptographic primitives which can address the privacy and security issues. It supports the computation on encrypted data, so that we can aggregate data and perform an arbitrary computation on an untrusted cloud environment without the leakage of sensitive information. METHODS: This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement. RESULTS: We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. CONCLUSIONS: We demonstrate the feasibility and scalability of our solution.
format Online
Article
Text
id pubmed-7372846
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73728462020-07-21 Semi-Parallel logistic regression for GWAS on encrypted data Kim, Miran Song, Yongsoo Li, Baiyu Micciancio, Daniele BMC Med Genomics Research BACKGROUND: The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has become a major hurdle for data management and utilization. Homomorphic encryption is one of the most powerful cryptographic primitives which can address the privacy and security issues. It supports the computation on encrypted data, so that we can aggregate data and perform an arbitrary computation on an untrusted cloud environment without the leakage of sensitive information. METHODS: This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement. RESULTS: We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. CONCLUSIONS: We demonstrate the feasibility and scalability of our solution. BioMed Central 2020-07-21 /pmc/articles/PMC7372846/ /pubmed/32693798 http://dx.doi.org/10.1186/s12920-020-0724-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kim, Miran
Song, Yongsoo
Li, Baiyu
Micciancio, Daniele
Semi-Parallel logistic regression for GWAS on encrypted data
title Semi-Parallel logistic regression for GWAS on encrypted data
title_full Semi-Parallel logistic regression for GWAS on encrypted data
title_fullStr Semi-Parallel logistic regression for GWAS on encrypted data
title_full_unstemmed Semi-Parallel logistic regression for GWAS on encrypted data
title_short Semi-Parallel logistic regression for GWAS on encrypted data
title_sort semi-parallel logistic regression for gwas on encrypted data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372846/
https://www.ncbi.nlm.nih.gov/pubmed/32693798
http://dx.doi.org/10.1186/s12920-020-0724-z
work_keys_str_mv AT kimmiran semiparallellogisticregressionforgwasonencrypteddata
AT songyongsoo semiparallellogisticregressionforgwasonencrypteddata
AT libaiyu semiparallellogisticregressionforgwasonencrypteddata
AT miccianciodaniele semiparallellogisticregressionforgwasonencrypteddata