Cargando…
Semi-Parallel logistic regression for GWAS on encrypted data
BACKGROUND: The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372846/ https://www.ncbi.nlm.nih.gov/pubmed/32693798 http://dx.doi.org/10.1186/s12920-020-0724-z |
_version_ | 1783561394041389056 |
---|---|
author | Kim, Miran Song, Yongsoo Li, Baiyu Micciancio, Daniele |
author_facet | Kim, Miran Song, Yongsoo Li, Baiyu Micciancio, Daniele |
author_sort | Kim, Miran |
collection | PubMed |
description | BACKGROUND: The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has become a major hurdle for data management and utilization. Homomorphic encryption is one of the most powerful cryptographic primitives which can address the privacy and security issues. It supports the computation on encrypted data, so that we can aggregate data and perform an arbitrary computation on an untrusted cloud environment without the leakage of sensitive information. METHODS: This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement. RESULTS: We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. CONCLUSIONS: We demonstrate the feasibility and scalability of our solution. |
format | Online Article Text |
id | pubmed-7372846 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73728462020-07-21 Semi-Parallel logistic regression for GWAS on encrypted data Kim, Miran Song, Yongsoo Li, Baiyu Micciancio, Daniele BMC Med Genomics Research BACKGROUND: The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has become a major hurdle for data management and utilization. Homomorphic encryption is one of the most powerful cryptographic primitives which can address the privacy and security issues. It supports the computation on encrypted data, so that we can aggregate data and perform an arbitrary computation on an untrusted cloud environment without the leakage of sensitive information. METHODS: This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement. RESULTS: We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. CONCLUSIONS: We demonstrate the feasibility and scalability of our solution. BioMed Central 2020-07-21 /pmc/articles/PMC7372846/ /pubmed/32693798 http://dx.doi.org/10.1186/s12920-020-0724-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Kim, Miran Song, Yongsoo Li, Baiyu Micciancio, Daniele Semi-Parallel logistic regression for GWAS on encrypted data |
title | Semi-Parallel logistic regression for GWAS on encrypted data |
title_full | Semi-Parallel logistic regression for GWAS on encrypted data |
title_fullStr | Semi-Parallel logistic regression for GWAS on encrypted data |
title_full_unstemmed | Semi-Parallel logistic regression for GWAS on encrypted data |
title_short | Semi-Parallel logistic regression for GWAS on encrypted data |
title_sort | semi-parallel logistic regression for gwas on encrypted data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372846/ https://www.ncbi.nlm.nih.gov/pubmed/32693798 http://dx.doi.org/10.1186/s12920-020-0724-z |
work_keys_str_mv | AT kimmiran semiparallellogisticregressionforgwasonencrypteddata AT songyongsoo semiparallellogisticregressionforgwasonencrypteddata AT libaiyu semiparallellogisticregressionforgwasonencrypteddata AT miccianciodaniele semiparallellogisticregressionforgwasonencrypteddata |