Cargando…

Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies

The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which...

Descripción completa

Detalles Bibliográficos
Autores principales: Wojcik, Genevieve L., Fuchsberger, Christian, Taliun, Daniel, Welch, Ryan, Martin, Alicia R, Shringarpure, Suyash, Carlson, Christopher S., Abecasis, Goncalo, Kang, Hyun Min, Boehnke, Michael, Bustamante, Carlos D., Gignoux, Christopher R., Kenny, Eimear E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169386/
https://www.ncbi.nlm.nih.gov/pubmed/30131328
http://dx.doi.org/10.1534/g3.118.200502
_version_ 1783360508246622208
author Wojcik, Genevieve L.
Fuchsberger, Christian
Taliun, Daniel
Welch, Ryan
Martin, Alicia R
Shringarpure, Suyash
Carlson, Christopher S.
Abecasis, Goncalo
Kang, Hyun Min
Boehnke, Michael
Bustamante, Carlos D.
Gignoux, Christopher R.
Kenny, Eimear E.
author_facet Wojcik, Genevieve L.
Fuchsberger, Christian
Taliun, Daniel
Welch, Ryan
Martin, Alicia R
Shringarpure, Suyash
Carlson, Christopher S.
Abecasis, Goncalo
Kang, Hyun Min
Boehnke, Michael
Bustamante, Carlos D.
Gignoux, Christopher R.
Kenny, Eimear E.
author_sort Wojcik, Genevieve L.
collection PubMed
description The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r(2) at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5–3.1% for an array of one million sites and 0.7–7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
format Online
Article
Text
id pubmed-6169386
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-61693862018-10-04 Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies Wojcik, Genevieve L. Fuchsberger, Christian Taliun, Daniel Welch, Ryan Martin, Alicia R Shringarpure, Suyash Carlson, Christopher S. Abecasis, Goncalo Kang, Hyun Min Boehnke, Michael Bustamante, Carlos D. Gignoux, Christopher R. Kenny, Eimear E. G3 (Bethesda) Investigations The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r(2) at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5–3.1% for an array of one million sites and 0.7–7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health. Genetics Society of America 2018-08-25 /pmc/articles/PMC6169386/ /pubmed/30131328 http://dx.doi.org/10.1534/g3.118.200502 Text en Copyright © 2018 Wojcik et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Wojcik, Genevieve L.
Fuchsberger, Christian
Taliun, Daniel
Welch, Ryan
Martin, Alicia R
Shringarpure, Suyash
Carlson, Christopher S.
Abecasis, Goncalo
Kang, Hyun Min
Boehnke, Michael
Bustamante, Carlos D.
Gignoux, Christopher R.
Kenny, Eimear E.
Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
title Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
title_full Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
title_fullStr Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
title_full_unstemmed Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
title_short Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies
title_sort imputation-aware tag snp selection to improve power for large-scale, multi-ethnic association studies
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169386/
https://www.ncbi.nlm.nih.gov/pubmed/30131328
http://dx.doi.org/10.1534/g3.118.200502
work_keys_str_mv AT wojcikgenevievel imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT fuchsbergerchristian imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT taliundaniel imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT welchryan imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT martinaliciar imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT shringarpuresuyash imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT carlsonchristophers imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT abecasisgoncalo imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT kanghyunmin imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT boehnkemichael imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT bustamantecarlosd imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT gignouxchristopherr imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies
AT kennyeimeare imputationawaretagsnpselectiontoimprovepowerforlargescalemultiethnicassociationstudies