Cargando…
Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes
Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assess...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4306552/ https://www.ncbi.nlm.nih.gov/pubmed/25621886 http://dx.doi.org/10.1371/journal.pone.0116487 |
_version_ | 1782354345354854400 |
---|---|
author | Zheng, Hou-Feng Rong, Jing-Jing Liu, Ming Han, Fang Zhang, Xing-Wei Richards, J. Brent Wang, Li |
author_facet | Zheng, Hou-Feng Rong, Jing-Jing Liu, Ming Han, Fang Zhang, Xing-Wei Richards, J. Brent Wang, Li |
author_sort | Zheng, Hou-Feng |
collection | PubMed |
description | Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute. |
format | Online Article Text |
id | pubmed-4306552 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-43065522015-01-30 Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes Zheng, Hou-Feng Rong, Jing-Jing Liu, Ming Han, Fang Zhang, Xing-Wei Richards, J. Brent Wang, Li PLoS One Research Article Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute. Public Library of Science 2015-01-26 /pmc/articles/PMC4306552/ /pubmed/25621886 http://dx.doi.org/10.1371/journal.pone.0116487 Text en © 2015 Zheng et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Zheng, Hou-Feng Rong, Jing-Jing Liu, Ming Han, Fang Zhang, Xing-Wei Richards, J. Brent Wang, Li Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes |
title | Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes |
title_full | Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes |
title_fullStr | Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes |
title_full_unstemmed | Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes |
title_short | Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes |
title_sort | performance of genotype imputation for low frequency and rare variants from the 1000 genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4306552/ https://www.ncbi.nlm.nih.gov/pubmed/25621886 http://dx.doi.org/10.1371/journal.pone.0116487 |
work_keys_str_mv | AT zhenghoufeng performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes AT rongjingjing performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes AT liuming performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes AT hanfang performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes AT zhangxingwei performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes AT richardsjbrent performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes AT wangli performanceofgenotypeimputationforlowfrequencyandrarevariantsfromthe1000genomes |