Cargando…

Tracing Sub-Structure in the European American Population with PCA-Informative Markers

Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent Europea...

Descripción completa

Detalles Bibliográficos
Autores principales: Paschou, Peristera, Drineas, Petros, Lewis, Jamey, Nievergelt, Caroline M., Nickerson, Deborah A., Smith, Joshua D., Ridker, Paul M., Chasman, Daniel I., Krauss, Ronald M., Ziv, Elad
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537989/
https://www.ncbi.nlm.nih.gov/pubmed/18797516
http://dx.doi.org/10.1371/journal.pgen.1000114
_version_ 1782159111633240064
author Paschou, Peristera
Drineas, Petros
Lewis, Jamey
Nievergelt, Caroline M.
Nickerson, Deborah A.
Smith, Joshua D.
Ridker, Paul M.
Chasman, Daniel I.
Krauss, Ronald M.
Ziv, Elad
author_facet Paschou, Peristera
Drineas, Petros
Lewis, Jamey
Nievergelt, Caroline M.
Nickerson, Deborah A.
Smith, Joshua D.
Ridker, Paul M.
Chasman, Daniel I.
Krauss, Ronald M.
Ziv, Elad
author_sort Paschou, Peristera
collection PubMed
description Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.
format Text
id pubmed-2537989
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-25379892008-09-17 Tracing Sub-Structure in the European American Population with PCA-Informative Markers Paschou, Peristera Drineas, Petros Lewis, Jamey Nievergelt, Caroline M. Nickerson, Deborah A. Smith, Joshua D. Ridker, Paul M. Chasman, Daniel I. Krauss, Ronald M. Ziv, Elad PLoS Genet Research Article Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs. Public Library of Science 2008-07-04 /pmc/articles/PMC2537989/ /pubmed/18797516 http://dx.doi.org/10.1371/journal.pgen.1000114 Text en Paschou et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Paschou, Peristera
Drineas, Petros
Lewis, Jamey
Nievergelt, Caroline M.
Nickerson, Deborah A.
Smith, Joshua D.
Ridker, Paul M.
Chasman, Daniel I.
Krauss, Ronald M.
Ziv, Elad
Tracing Sub-Structure in the European American Population with PCA-Informative Markers
title Tracing Sub-Structure in the European American Population with PCA-Informative Markers
title_full Tracing Sub-Structure in the European American Population with PCA-Informative Markers
title_fullStr Tracing Sub-Structure in the European American Population with PCA-Informative Markers
title_full_unstemmed Tracing Sub-Structure in the European American Population with PCA-Informative Markers
title_short Tracing Sub-Structure in the European American Population with PCA-Informative Markers
title_sort tracing sub-structure in the european american population with pca-informative markers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537989/
https://www.ncbi.nlm.nih.gov/pubmed/18797516
http://dx.doi.org/10.1371/journal.pgen.1000114
work_keys_str_mv AT paschouperistera tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT drineaspetros tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT lewisjamey tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT nievergeltcarolinem tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT nickersondeboraha tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT smithjoshuad tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT ridkerpaulm tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT chasmandanieli tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT kraussronaldm tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers
AT zivelad tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers