Cargando…

SNP imputation bias reduces effect size determination

Imputation is a commonly used technique that exploits linkage disequilibrium to infer missing genotypes in genetic datasets, using a well-characterized reference population. While there is agreement that the reference population has to match the ethnicity of the query dataset, it is common practice...

Descripción completa

Detalles Bibliográficos
Autores principales:	Khankhanian, Pouya, Din, Lennox, Caillier, Stacy J., Gourraud, Pierre-Antoine, Baranzini, Sergio E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2015
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321633/ https://www.ncbi.nlm.nih.gov/pubmed/25709616 http://dx.doi.org/10.3389/fgene.2015.00030

_version_	1782356288822312960
author	Khankhanian, Pouya Din, Lennox Caillier, Stacy J. Gourraud, Pierre-Antoine Baranzini, Sergio E.
author_facet	Khankhanian, Pouya Din, Lennox Caillier, Stacy J. Gourraud, Pierre-Antoine Baranzini, Sergio E.
author_sort	Khankhanian, Pouya
collection	PubMed
description	Imputation is a commonly used technique that exploits linkage disequilibrium to infer missing genotypes in genetic datasets, using a well-characterized reference population. While there is agreement that the reference population has to match the ethnicity of the query dataset, it is common practice to use the same reference to impute genotypes for a wide variety of phenotypes. We hypothesized that using a reference composed of samples with a different phenotype than the query dataset would introduce imputation bias. To test this hypothesis we used GWAS datasets from Amyotrophic Lateral Sclerosis (ALS), Parkinson Disease (PD), and Crohn's Disease (CD). First, we masked and then performed imputation of 100 disease-associated markers and 100 non-associated markers from each study. Two references for imputation were used in parallel: one consisting of healthy controls and another consisting of patients with the same disease. We assessed the discordance (imprecision) and bias (inaccuracy) of imputation by comparing predicted genotypes to those assayed by SNP-chip. We also assessed the bias on the observed effect size when the predicted genotypes were used in a GWAS study. When healthy controls were used as reference for imputation, a significant bias was observed, particularly in the disease-associated markers. Using cases as reference significantly attenuated this bias. For nearly all markers, the direction of the bias favored the non-risk allele. In GWAS studies of the three diseases (with healthy reference controls from the 1000 genomes as reference), the mean OR for disease-associated markers obtained by imputation was lower than that obtained using original assayed genotypes. We found that the bias is inherent to imputation as using different methods did not alter the results. In conclusion, imputation is a powerful method to predict genotypes and estimate genetic risk for GWAS. However, a careful choice of reference population is needed to minimize biases inherent to this approach.
format	Online Article Text
id	pubmed-4321633
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-43216332015-02-23 SNP imputation bias reduces effect size determination Khankhanian, Pouya Din, Lennox Caillier, Stacy J. Gourraud, Pierre-Antoine Baranzini, Sergio E. Front Genet Genetics Imputation is a commonly used technique that exploits linkage disequilibrium to infer missing genotypes in genetic datasets, using a well-characterized reference population. While there is agreement that the reference population has to match the ethnicity of the query dataset, it is common practice to use the same reference to impute genotypes for a wide variety of phenotypes. We hypothesized that using a reference composed of samples with a different phenotype than the query dataset would introduce imputation bias. To test this hypothesis we used GWAS datasets from Amyotrophic Lateral Sclerosis (ALS), Parkinson Disease (PD), and Crohn's Disease (CD). First, we masked and then performed imputation of 100 disease-associated markers and 100 non-associated markers from each study. Two references for imputation were used in parallel: one consisting of healthy controls and another consisting of patients with the same disease. We assessed the discordance (imprecision) and bias (inaccuracy) of imputation by comparing predicted genotypes to those assayed by SNP-chip. We also assessed the bias on the observed effect size when the predicted genotypes were used in a GWAS study. When healthy controls were used as reference for imputation, a significant bias was observed, particularly in the disease-associated markers. Using cases as reference significantly attenuated this bias. For nearly all markers, the direction of the bias favored the non-risk allele. In GWAS studies of the three diseases (with healthy reference controls from the 1000 genomes as reference), the mean OR for disease-associated markers obtained by imputation was lower than that obtained using original assayed genotypes. We found that the bias is inherent to imputation as using different methods did not alter the results. In conclusion, imputation is a powerful method to predict genotypes and estimate genetic risk for GWAS. However, a careful choice of reference population is needed to minimize biases inherent to this approach. Frontiers Media S.A. 2015-02-09 /pmc/articles/PMC4321633/ /pubmed/25709616 http://dx.doi.org/10.3389/fgene.2015.00030 Text en Copyright © 2015 Khankhanian, Din, Caillier, Gourraud and Baranzini. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Khankhanian, Pouya Din, Lennox Caillier, Stacy J. Gourraud, Pierre-Antoine Baranzini, Sergio E. SNP imputation bias reduces effect size determination
title	SNP imputation bias reduces effect size determination
title_full	SNP imputation bias reduces effect size determination
title_fullStr	SNP imputation bias reduces effect size determination
title_full_unstemmed	SNP imputation bias reduces effect size determination
title_short	SNP imputation bias reduces effect size determination
title_sort	snp imputation bias reduces effect size determination
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321633/ https://www.ncbi.nlm.nih.gov/pubmed/25709616 http://dx.doi.org/10.3389/fgene.2015.00030
work_keys_str_mv	AT khankhanianpouya snpimputationbiasreduceseffectsizedetermination AT dinlennox snpimputationbiasreduceseffectsizedetermination AT caillierstacyj snpimputationbiasreduceseffectsizedetermination AT gourraudpierreantoine snpimputationbiasreduceseffectsizedetermination AT baranzinisergioe snpimputationbiasreduceseffectsizedetermination

SNP imputation bias reduces effect size determination

Ejemplares similares