Cargando…

A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design

Extreme phenotype sampling (EPS) is a popular study design used to reduce genotyping or sequencing costs. Assuming continuous phenotype data are available on a large cohort, EPS involves genotyping or sequencing only those individuals with extreme phenotypic values. Although this design has been sho...

Descripción completa

Detalles Bibliográficos
Autores principales: Panarella, Michela, Burkett, Kelly M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509877/
https://www.ncbi.nlm.nih.gov/pubmed/31130982
http://dx.doi.org/10.3389/fgene.2019.00398
_version_ 1783417338119323648
author Panarella, Michela
Burkett, Kelly M.
author_facet Panarella, Michela
Burkett, Kelly M.
author_sort Panarella, Michela
collection PubMed
description Extreme phenotype sampling (EPS) is a popular study design used to reduce genotyping or sequencing costs. Assuming continuous phenotype data are available on a large cohort, EPS involves genotyping or sequencing only those individuals with extreme phenotypic values. Although this design has been shown to have high power to detect genetic effects even at smaller sample sizes, little attention has been paid to the effects of confounding variables, and in particular population stratification. Using extensive simulations, we demonstrate that the false positive rate under the EPS design is greatly inflated relative to a random sample of equal size or a “case-control”-like design where the cases are from one phenotypic extreme and the controls randomly sampled. The inflated false positive rate is observed even with allele frequency and phenotype mean differences taken from European population data. We show that the effects of confounding are not reduced by increasing the sample size. We also show that including the top principal components in a logistic regression model is sufficient for controlling the type 1 error rate using data simulated with a population genetics model and using 1,000 Genomes genotype data. Our results suggest that when an EPS study is conducted, it is crucial to adjust for all confounding variables. For genetic association studies this requires genotyping a sufficient number of markers to allow for ancestry estimation. Unfortunately, this could increase the costs of a study if sequencing or genotyping was only planned for candidate genes or pathways; the available genetic data would not be suitable for ancestry correction as many of the variants could have a true association with the trait.
format Online
Article
Text
id pubmed-6509877
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-65098772019-05-24 A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design Panarella, Michela Burkett, Kelly M. Front Genet Genetics Extreme phenotype sampling (EPS) is a popular study design used to reduce genotyping or sequencing costs. Assuming continuous phenotype data are available on a large cohort, EPS involves genotyping or sequencing only those individuals with extreme phenotypic values. Although this design has been shown to have high power to detect genetic effects even at smaller sample sizes, little attention has been paid to the effects of confounding variables, and in particular population stratification. Using extensive simulations, we demonstrate that the false positive rate under the EPS design is greatly inflated relative to a random sample of equal size or a “case-control”-like design where the cases are from one phenotypic extreme and the controls randomly sampled. The inflated false positive rate is observed even with allele frequency and phenotype mean differences taken from European population data. We show that the effects of confounding are not reduced by increasing the sample size. We also show that including the top principal components in a logistic regression model is sufficient for controlling the type 1 error rate using data simulated with a population genetics model and using 1,000 Genomes genotype data. Our results suggest that when an EPS study is conducted, it is crucial to adjust for all confounding variables. For genetic association studies this requires genotyping a sufficient number of markers to allow for ancestry estimation. Unfortunately, this could increase the costs of a study if sequencing or genotyping was only planned for candidate genes or pathways; the available genetic data would not be suitable for ancestry correction as many of the variants could have a true association with the trait. Frontiers Media S.A. 2019-05-03 /pmc/articles/PMC6509877/ /pubmed/31130982 http://dx.doi.org/10.3389/fgene.2019.00398 Text en Copyright © 2019 Panarella and Burkett. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Panarella, Michela
Burkett, Kelly M.
A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design
title A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design
title_full A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design
title_fullStr A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design
title_full_unstemmed A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design
title_short A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design
title_sort cautionary note on the effects of population stratification under an extreme phenotype sampling design
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6509877/
https://www.ncbi.nlm.nih.gov/pubmed/31130982
http://dx.doi.org/10.3389/fgene.2019.00398
work_keys_str_mv AT panarellamichela acautionarynoteontheeffectsofpopulationstratificationunderanextremephenotypesamplingdesign
AT burkettkellym acautionarynoteontheeffectsofpopulationstratificationunderanextremephenotypesamplingdesign
AT panarellamichela cautionarynoteontheeffectsofpopulationstratificationunderanextremephenotypesamplingdesign
AT burkettkellym cautionarynoteontheeffectsofpopulationstratificationunderanextremephenotypesamplingdesign