Cargando…

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network

Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kojima, Kaname, Tadaka, Shu, Katsuoka, Fumiki, Tamiya, Gen, Yamamoto, Masayuki, Kinoshita, Kengo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529210/ https://www.ncbi.nlm.nih.gov/pubmed/33001993 http://dx.doi.org/10.1371/journal.pcbi.1008207

_version_	1783589388371886080
author	Kojima, Kaname Tadaka, Shu Katsuoka, Fumiki Tamiya, Gen Yamamoto, Masayuki Kinoshita, Kengo
author_facet	Kojima, Kaname Tadaka, Shu Katsuoka, Fumiki Tamiya, Gen Yamamoto, Masayuki Kinoshita, Kengo
author_sort	Kojima, Kaname
collection	PubMed
description	Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the data-sharing of sensitive genome data under the recent movement for the protection of individuals’ privacy.
format	Online Article Text
id	pubmed-7529210
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-75292102020-10-02 A genotype imputation method for de-identified haplotype reference information by using recurrent neural network Kojima, Kaname Tadaka, Shu Katsuoka, Fumiki Tamiya, Gen Yamamoto, Masayuki Kinoshita, Kengo PLoS Comput Biol Research Article Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the data-sharing of sensitive genome data under the recent movement for the protection of individuals’ privacy. Public Library of Science 2020-10-01 /pmc/articles/PMC7529210/ /pubmed/33001993 http://dx.doi.org/10.1371/journal.pcbi.1008207 Text en © 2020 Kojima et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kojima, Kaname Tadaka, Shu Katsuoka, Fumiki Tamiya, Gen Yamamoto, Masayuki Kinoshita, Kengo A genotype imputation method for de-identified haplotype reference information by using recurrent neural network
title	A genotype imputation method for de-identified haplotype reference information by using recurrent neural network
title_full	A genotype imputation method for de-identified haplotype reference information by using recurrent neural network
title_fullStr	A genotype imputation method for de-identified haplotype reference information by using recurrent neural network
title_full_unstemmed	A genotype imputation method for de-identified haplotype reference information by using recurrent neural network
title_short	A genotype imputation method for de-identified haplotype reference information by using recurrent neural network
title_sort	genotype imputation method for de-identified haplotype reference information by using recurrent neural network
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529210/ https://www.ncbi.nlm.nih.gov/pubmed/33001993 http://dx.doi.org/10.1371/journal.pcbi.1008207
work_keys_str_mv	AT kojimakaname agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT tadakashu agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT katsuokafumiki agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT tamiyagen agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT yamamotomasayuki agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT kinoshitakengo agenotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT kojimakaname genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT tadakashu genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT katsuokafumiki genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT tamiyagen genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT yamamotomasayuki genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork AT kinoshitakengo genotypeimputationmethodfordeidentifiedhaplotypereferenceinformationbyusingrecurrentneuralnetwork

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network

Ejemplares similares