Cargando…

Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption

The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotyp...

Descripción completa

Detalles Bibliográficos
Autores principales:	SARKAR, ESHA, CHIELLE, EDUARDO, GÜRSOY, GAMZE, MAZONKA, OLEG, GERSTEIN, MARK, MANIATAKOS, MICHAIL
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8409799/ https://www.ncbi.nlm.nih.gov/pubmed/34476144 http://dx.doi.org/10.1109/access.2021.3093005

_version_	1783747051805212672
author	SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL
author_facet	SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL
author_sort	SARKAR, ESHA
collection	PubMed
description	The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets.
format	Online Article Text
id	pubmed-8409799
institution	National Center for Biotechnology Information
language	English
publishDate	2021
record_format	MEDLINE/PubMed
spelling	pubmed-84097992021-09-01 Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL IEEE Access Article The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets. 2021-06-28 2021 /pmc/articles/PMC8409799/ /pubmed/34476144 http://dx.doi.org/10.1109/access.2021.3093005 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle	Article SARKAR, ESHA CHIELLE, EDUARDO GÜRSOY, GAMZE MAZONKA, OLEG GERSTEIN, MARK MANIATAKOS, MICHAIL Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
title	Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
title_full	Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
title_fullStr	Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
title_full_unstemmed	Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
title_short	Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption
title_sort	fast and scalable private genotype imputation using machine learning and partially homomorphic encryption
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8409799/ https://www.ncbi.nlm.nih.gov/pubmed/34476144 http://dx.doi.org/10.1109/access.2021.3093005
work_keys_str_mv	AT sarkaresha fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT chielleeduardo fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT gursoygamze fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT mazonkaoleg fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT gersteinmark fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption AT maniatakosmichail fastandscalableprivategenotypeimputationusingmachinelearningandpartiallyhomomorphicencryption

Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption

Ejemplares similares