Cargando…
Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotyp...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8409799/ https://www.ncbi.nlm.nih.gov/pubmed/34476144 http://dx.doi.org/10.1109/access.2021.3093005 |
_version_ | 1783747051805212672 |
---|---|
author | SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL |
author_facet | SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL |
author_sort | SARKAR, ESHA |
collection | PubMed |
description | The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets. |
format | Online Article Text |
id | pubmed-8409799 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
record_format | MEDLINE/PubMed |
spelling | pubmed-84097992021-09-01 Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL IEEE Access Article The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets. 2021-06-28 2021 /pmc/articles/PMC8409799/ /pubmed/34476144 http://dx.doi.org/10.1109/access.2021.3093005 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption |
title | Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption |
title_full | Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption |
title_fullStr | Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption |
title_full_unstemmed | Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption |
title_short | Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption |
title_sort | fast and scalable private genotype imputation using machine learning and partially homomorphic encryption |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8409799/ https://www.ncbi.nlm.nih.gov/pubmed/34476144 http://dx.doi.org/10.1109/access.2021.3093005 |
work_keys_str_mv | AT sarkaresha fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT chielleeduardo fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT gursoygamze fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT mazonkaoleg fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT gersteinmark fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT maniatakosmichail fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption |