Cargando…
Implication of next-generation sequencing on association studies
BACKGROUND: Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide as...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3148210/ https://www.ncbi.nlm.nih.gov/pubmed/21682891 http://dx.doi.org/10.1186/1471-2164-12-322 |
_version_ | 1782209324284641280 |
---|---|
author | Siu, Hoicheong Zhu, Yun Jin, Li Xiong, Momiao |
author_facet | Siu, Hoicheong Zhu, Yun Jin, Li Xiong, Momiao |
author_sort | Siu, Hoicheong |
collection | PubMed |
description | BACKGROUND: Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5% < MAF). The methods and study design for testing the association of low frequency (0.5% < MAF ≤ 5%) and rare variation (MAF ≤ 0.5%) have not been thoroughly investigated. The 1000 Genomes Project represents one such endeavour to characterize the human genetic variation pattern at the MAF = 1% level as a foundation for association studies. In this report, we explore different strategies and study designs for the near future GWAS in the post-era, based on both low coverage pilot data and exon pilot data in 1000 Genomes Project. RESULTS: We investigated the linkage disequilibrium (LD) pattern among common and low frequency SNPs and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain ~14 M SNPs as a hypothetical genotype-array platform (Pilot 14 M) to interrogate its impact on the selection of tag SNPs, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNPs which were untaggable and only 67.7% of the low frequency variation was covered by the Pilot 14 M array. CONCLUSIONS: This suggested GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation. |
format | Online Article Text |
id | pubmed-3148210 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31482102011-08-02 Implication of next-generation sequencing on association studies Siu, Hoicheong Zhu, Yun Jin, Li Xiong, Momiao BMC Genomics Research Article BACKGROUND: Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5% < MAF). The methods and study design for testing the association of low frequency (0.5% < MAF ≤ 5%) and rare variation (MAF ≤ 0.5%) have not been thoroughly investigated. The 1000 Genomes Project represents one such endeavour to characterize the human genetic variation pattern at the MAF = 1% level as a foundation for association studies. In this report, we explore different strategies and study designs for the near future GWAS in the post-era, based on both low coverage pilot data and exon pilot data in 1000 Genomes Project. RESULTS: We investigated the linkage disequilibrium (LD) pattern among common and low frequency SNPs and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain ~14 M SNPs as a hypothetical genotype-array platform (Pilot 14 M) to interrogate its impact on the selection of tag SNPs, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNPs which were untaggable and only 67.7% of the low frequency variation was covered by the Pilot 14 M array. CONCLUSIONS: This suggested GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation. BioMed Central 2011-06-17 /pmc/articles/PMC3148210/ /pubmed/21682891 http://dx.doi.org/10.1186/1471-2164-12-322 Text en Copyright ©2011 Siu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Siu, Hoicheong Zhu, Yun Jin, Li Xiong, Momiao Implication of next-generation sequencing on association studies |
title | Implication of next-generation sequencing on association studies |
title_full | Implication of next-generation sequencing on association studies |
title_fullStr | Implication of next-generation sequencing on association studies |
title_full_unstemmed | Implication of next-generation sequencing on association studies |
title_short | Implication of next-generation sequencing on association studies |
title_sort | implication of next-generation sequencing on association studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3148210/ https://www.ncbi.nlm.nih.gov/pubmed/21682891 http://dx.doi.org/10.1186/1471-2164-12-322 |
work_keys_str_mv | AT siuhoicheong implicationofnextgenerationsequencingonassociationstudies AT zhuyun implicationofnextgenerationsequencingonassociationstudies AT jinli implicationofnextgenerationsequencingonassociationstudies AT xiongmomiao implicationofnextgenerationsequencingonassociationstudies |