Cargando…

Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data

Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide assoc...

Descripción completa

Detalles Bibliográficos
Autores principales: Oetjens, Matthew T., Brown-Gentry, Kristin, Goodloe, Robert, Dilks, Holli H., Crawford, Dana C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858524/
https://www.ncbi.nlm.nih.gov/pubmed/27200085
http://dx.doi.org/10.3389/fgene.2016.00076
_version_ 1782430815112658944
author Oetjens, Matthew T.
Brown-Gentry, Kristin
Goodloe, Robert
Dilks, Holli H.
Crawford, Dana C.
author_facet Oetjens, Matthew T.
Brown-Gentry, Kristin
Goodloe, Robert
Dilks, Holli H.
Crawford, Dana C.
author_sort Oetjens, Matthew T.
collection PubMed
description Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n = 14,998) were extracted from biospecimens from consented NHANES participants between 1991–1994 (NHANES III, phase 2) and 1999–2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We as the Epidemiologic Architecture for Genes Linked to Environment study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999–2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype–phenotype studies but are sans genome-wide data.
format Online
Article
Text
id pubmed-4858524
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-48585242016-05-19 Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data Oetjens, Matthew T. Brown-Gentry, Kristin Goodloe, Robert Dilks, Holli H. Crawford, Dana C. Front Genet Genetics Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n = 14,998) were extracted from biospecimens from consented NHANES participants between 1991–1994 (NHANES III, phase 2) and 1999–2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We as the Epidemiologic Architecture for Genes Linked to Environment study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999–2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype–phenotype studies but are sans genome-wide data. Frontiers Media S.A. 2016-05-06 /pmc/articles/PMC4858524/ /pubmed/27200085 http://dx.doi.org/10.3389/fgene.2016.00076 Text en Copyright © 2016 Oetjens, Brown-Gentry, Goodloe, Dilks and Crawford. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Oetjens, Matthew T.
Brown-Gentry, Kristin
Goodloe, Robert
Dilks, Holli H.
Crawford, Dana C.
Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
title Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
title_full Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
title_fullStr Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
title_full_unstemmed Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
title_short Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
title_sort population stratification in the context of diverse epidemiologic surveys sans genome-wide data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858524/
https://www.ncbi.nlm.nih.gov/pubmed/27200085
http://dx.doi.org/10.3389/fgene.2016.00076
work_keys_str_mv AT oetjensmatthewt populationstratificationinthecontextofdiverseepidemiologicsurveyssansgenomewidedata
AT browngentrykristin populationstratificationinthecontextofdiverseepidemiologicsurveyssansgenomewidedata
AT goodloerobert populationstratificationinthecontextofdiverseepidemiologicsurveyssansgenomewidedata
AT dilkshollih populationstratificationinthecontextofdiverseepidemiologicsurveyssansgenomewidedata
AT crawforddanac populationstratificationinthecontextofdiverseepidemiologicsurveyssansgenomewidedata