Cargando…

An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations

BACKGROUND: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Marcio AA, Oliveira, Paulo SL, Pereira, Tiago V, Krieger, José E, Pereira, Alexandre C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224203/
https://www.ncbi.nlm.nih.gov/pubmed/21251252
http://dx.doi.org/10.1186/1471-2156-12-10
_version_ 1782217351395016704
author Almeida, Marcio AA
Oliveira, Paulo SL
Pereira, Tiago V
Krieger, José E
Pereira, Alexandre C
author_facet Almeida, Marcio AA
Oliveira, Paulo SL
Pereira, Tiago V
Krieger, José E
Pereira, Alexandre C
author_sort Almeida, Marcio AA
collection PubMed
description BACKGROUND: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. RESULTS: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10 (-5 )for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. CONCLUSIONS: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
format Online
Article
Text
id pubmed-3224203
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32242032011-11-27 An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations Almeida, Marcio AA Oliveira, Paulo SL Pereira, Tiago V Krieger, José E Pereira, Alexandre C BMC Genet Research Article BACKGROUND: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. RESULTS: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10 (-5 )for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. CONCLUSIONS: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies. BioMed Central 2011-01-20 /pmc/articles/PMC3224203/ /pubmed/21251252 http://dx.doi.org/10.1186/1471-2156-12-10 Text en Copyright ©2011 Almeida et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Almeida, Marcio AA
Oliveira, Paulo SL
Pereira, Tiago V
Krieger, José E
Pereira, Alexandre C
An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
title An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
title_full An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
title_fullStr An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
title_full_unstemmed An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
title_short An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
title_sort empirical evaluation of imputation accuracy for association statistics reveals increased type-i error rates in genome-wide associations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224203/
https://www.ncbi.nlm.nih.gov/pubmed/21251252
http://dx.doi.org/10.1186/1471-2156-12-10
work_keys_str_mv AT almeidamarcioaa anempiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT oliveirapaulosl anempiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT pereiratiagov anempiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT kriegerjosee anempiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT pereiraalexandrec anempiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT almeidamarcioaa empiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT oliveirapaulosl empiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT pereiratiagov empiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT kriegerjosee empiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations
AT pereiraalexandrec empiricalevaluationofimputationaccuracyforassociationstatisticsrevealsincreasedtypeierrorratesingenomewideassociations