Cargando…

Utilizing Genotype Imputation for the Augmentation of Sequence Data

BACKGROUND: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide assoc...

Descripción completa

Detalles Bibliográficos
Autores principales: Fridley, Brooke L., Jenkins, Gregory, Deyo-Svendsen, Matthew E., Hebbring, Scott, Freimuth, Robert
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2882389/
https://www.ncbi.nlm.nih.gov/pubmed/20543988
http://dx.doi.org/10.1371/journal.pone.0011018
_version_ 1782182185855352832
author Fridley, Brooke L.
Jenkins, Gregory
Deyo-Svendsen, Matthew E.
Hebbring, Scott
Freimuth, Robert
author_facet Fridley, Brooke L.
Jenkins, Gregory
Deyo-Svendsen, Matthew E.
Hebbring, Scott
Freimuth, Robert
author_sort Fridley, Brooke L.
collection PubMed
description BACKGROUND: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci. METHODOLOGY: A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project. CONCLUSIONS: Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.
format Text
id pubmed-2882389
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28823892010-06-11 Utilizing Genotype Imputation for the Augmentation of Sequence Data Fridley, Brooke L. Jenkins, Gregory Deyo-Svendsen, Matthew E. Hebbring, Scott Freimuth, Robert PLoS One Research Article BACKGROUND: In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci. METHODOLOGY: A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project. CONCLUSIONS: Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible. Public Library of Science 2010-06-08 /pmc/articles/PMC2882389/ /pubmed/20543988 http://dx.doi.org/10.1371/journal.pone.0011018 Text en Fridley et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Fridley, Brooke L.
Jenkins, Gregory
Deyo-Svendsen, Matthew E.
Hebbring, Scott
Freimuth, Robert
Utilizing Genotype Imputation for the Augmentation of Sequence Data
title Utilizing Genotype Imputation for the Augmentation of Sequence Data
title_full Utilizing Genotype Imputation for the Augmentation of Sequence Data
title_fullStr Utilizing Genotype Imputation for the Augmentation of Sequence Data
title_full_unstemmed Utilizing Genotype Imputation for the Augmentation of Sequence Data
title_short Utilizing Genotype Imputation for the Augmentation of Sequence Data
title_sort utilizing genotype imputation for the augmentation of sequence data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2882389/
https://www.ncbi.nlm.nih.gov/pubmed/20543988
http://dx.doi.org/10.1371/journal.pone.0011018
work_keys_str_mv AT fridleybrookel utilizinggenotypeimputationfortheaugmentationofsequencedata
AT jenkinsgregory utilizinggenotypeimputationfortheaugmentationofsequencedata
AT deyosvendsenmatthewe utilizinggenotypeimputationfortheaugmentationofsequencedata
AT hebbringscott utilizinggenotypeimputationfortheaugmentationofsequencedata
AT freimuthrobert utilizinggenotypeimputationfortheaugmentationofsequencedata