Cargando…

A New Statistic to Evaluate Imputation Reliability

BACKGROUND: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele freq...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Peng, Hartz, Sarah M., Zhang, Zhehao, Saccone, Scott F., Wang, Jia, Tischfield, Jay A., Edenberg, Howard J., Kramer, John R., M.Goate, Alison, Bierut, Laura J., Rice, John P.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837741/
https://www.ncbi.nlm.nih.gov/pubmed/20300623
http://dx.doi.org/10.1371/journal.pone.0009697
_version_ 1782178848106872832
author Lin, Peng
Hartz, Sarah M.
Zhang, Zhehao
Saccone, Scott F.
Wang, Jia
Tischfield, Jay A.
Edenberg, Howard J.
Kramer, John R.
M.Goate, Alison
Bierut, Laura J.
Rice, John P.
author_facet Lin, Peng
Hartz, Sarah M.
Zhang, Zhehao
Saccone, Scott F.
Wang, Jia
Tischfield, Jay A.
Edenberg, Howard J.
Kramer, John R.
M.Goate, Alison
Bierut, Laura J.
Rice, John P.
author_sort Lin, Peng
collection PubMed
description BACKGROUND: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. METHODOLOGY/PRINCIPAL FINDINGS: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into “cases” and “controls”, we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. CONCLUSIONS/SIGNIFICANCE: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.
format Text
id pubmed-2837741
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28377412010-03-18 A New Statistic to Evaluate Imputation Reliability Lin, Peng Hartz, Sarah M. Zhang, Zhehao Saccone, Scott F. Wang, Jia Tischfield, Jay A. Edenberg, Howard J. Kramer, John R. M.Goate, Alison Bierut, Laura J. Rice, John P. PLoS One Research Article BACKGROUND: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. METHODOLOGY/PRINCIPAL FINDINGS: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into “cases” and “controls”, we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. CONCLUSIONS/SIGNIFICANCE: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms. Public Library of Science 2010-03-15 /pmc/articles/PMC2837741/ /pubmed/20300623 http://dx.doi.org/10.1371/journal.pone.0009697 Text en Lin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lin, Peng
Hartz, Sarah M.
Zhang, Zhehao
Saccone, Scott F.
Wang, Jia
Tischfield, Jay A.
Edenberg, Howard J.
Kramer, John R.
M.Goate, Alison
Bierut, Laura J.
Rice, John P.
A New Statistic to Evaluate Imputation Reliability
title A New Statistic to Evaluate Imputation Reliability
title_full A New Statistic to Evaluate Imputation Reliability
title_fullStr A New Statistic to Evaluate Imputation Reliability
title_full_unstemmed A New Statistic to Evaluate Imputation Reliability
title_short A New Statistic to Evaluate Imputation Reliability
title_sort new statistic to evaluate imputation reliability
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837741/
https://www.ncbi.nlm.nih.gov/pubmed/20300623
http://dx.doi.org/10.1371/journal.pone.0009697
work_keys_str_mv AT linpeng anewstatistictoevaluateimputationreliability
AT hartzsarahm anewstatistictoevaluateimputationreliability
AT zhangzhehao anewstatistictoevaluateimputationreliability
AT sacconescottf anewstatistictoevaluateimputationreliability
AT wangjia anewstatistictoevaluateimputationreliability
AT tischfieldjaya anewstatistictoevaluateimputationreliability
AT edenberghowardj anewstatistictoevaluateimputationreliability
AT kramerjohnr anewstatistictoevaluateimputationreliability
AT mgoatealison anewstatistictoevaluateimputationreliability
AT bierutlauraj anewstatistictoevaluateimputationreliability
AT ricejohnp anewstatistictoevaluateimputationreliability
AT anewstatistictoevaluateimputationreliability
AT linpeng newstatistictoevaluateimputationreliability
AT hartzsarahm newstatistictoevaluateimputationreliability
AT zhangzhehao newstatistictoevaluateimputationreliability
AT sacconescottf newstatistictoevaluateimputationreliability
AT wangjia newstatistictoevaluateimputationreliability
AT tischfieldjaya newstatistictoevaluateimputationreliability
AT edenberghowardj newstatistictoevaluateimputationreliability
AT kramerjohnr newstatistictoevaluateimputationreliability
AT mgoatealison newstatistictoevaluateimputationreliability
AT bierutlauraj newstatistictoevaluateimputationreliability
AT ricejohnp newstatistictoevaluateimputationreliability
AT newstatistictoevaluateimputationreliability