Cargando…

On Combining Reference Data to Improve Imputation Accuracy

Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Jun, Zhang, Ji-Gang, Li, Jian, Pei, Yu-Fang, Deng, Hong-Wen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3559437/ https://www.ncbi.nlm.nih.gov/pubmed/23383238 http://dx.doi.org/10.1371/journal.pone.0055600

_version_	1782257580279595008
author	Chen, Jun Zhang, Ji-Gang Li, Jian Pei, Yu-Fang Deng, Hong-Wen
author_facet	Chen, Jun Zhang, Ji-Gang Li, Jian Pei, Yu-Fang Deng, Hong-Wen
author_sort	Chen, Jun
collection	PubMed
description	Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS) data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE) for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses.
format	Online Article Text
id	pubmed-3559437
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-35594372013-02-04 On Combining Reference Data to Improve Imputation Accuracy Chen, Jun Zhang, Ji-Gang Li, Jian Pei, Yu-Fang Deng, Hong-Wen PLoS One Research Article Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS) data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE) for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses. Public Library of Science 2013-01-30 /pmc/articles/PMC3559437/ /pubmed/23383238 http://dx.doi.org/10.1371/journal.pone.0055600 Text en © 2013 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Chen, Jun Zhang, Ji-Gang Li, Jian Pei, Yu-Fang Deng, Hong-Wen On Combining Reference Data to Improve Imputation Accuracy
title	On Combining Reference Data to Improve Imputation Accuracy
title_full	On Combining Reference Data to Improve Imputation Accuracy
title_fullStr	On Combining Reference Data to Improve Imputation Accuracy
title_full_unstemmed	On Combining Reference Data to Improve Imputation Accuracy
title_short	On Combining Reference Data to Improve Imputation Accuracy
title_sort	on combining reference data to improve imputation accuracy
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3559437/ https://www.ncbi.nlm.nih.gov/pubmed/23383238 http://dx.doi.org/10.1371/journal.pone.0055600
work_keys_str_mv	AT chenjun oncombiningreferencedatatoimproveimputationaccuracy AT zhangjigang oncombiningreferencedatatoimproveimputationaccuracy AT lijian oncombiningreferencedatatoimproveimputationaccuracy AT peiyufang oncombiningreferencedatatoimproveimputationaccuracy AT denghongwen oncombiningreferencedatatoimproveimputationaccuracy

On Combining Reference Data to Improve Imputation Accuracy

Ejemplares similares