Cargando…
Tracing Sub-Structure in the European American Population with PCA-Informative Markers
Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent Europea...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537989/ https://www.ncbi.nlm.nih.gov/pubmed/18797516 http://dx.doi.org/10.1371/journal.pgen.1000114 |
_version_ | 1782159111633240064 |
---|---|
author | Paschou, Peristera Drineas, Petros Lewis, Jamey Nievergelt, Caroline M. Nickerson, Deborah A. Smith, Joshua D. Ridker, Paul M. Chasman, Daniel I. Krauss, Ronald M. Ziv, Elad |
author_facet | Paschou, Peristera Drineas, Petros Lewis, Jamey Nievergelt, Caroline M. Nickerson, Deborah A. Smith, Joshua D. Ridker, Paul M. Chasman, Daniel I. Krauss, Ronald M. Ziv, Elad |
author_sort | Paschou, Peristera |
collection | PubMed |
description | Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs. |
format | Text |
id | pubmed-2537989 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-25379892008-09-17 Tracing Sub-Structure in the European American Population with PCA-Informative Markers Paschou, Peristera Drineas, Petros Lewis, Jamey Nievergelt, Caroline M. Nickerson, Deborah A. Smith, Joshua D. Ridker, Paul M. Chasman, Daniel I. Krauss, Ronald M. Ziv, Elad PLoS Genet Research Article Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs. Public Library of Science 2008-07-04 /pmc/articles/PMC2537989/ /pubmed/18797516 http://dx.doi.org/10.1371/journal.pgen.1000114 Text en Paschou et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Paschou, Peristera Drineas, Petros Lewis, Jamey Nievergelt, Caroline M. Nickerson, Deborah A. Smith, Joshua D. Ridker, Paul M. Chasman, Daniel I. Krauss, Ronald M. Ziv, Elad Tracing Sub-Structure in the European American Population with PCA-Informative Markers |
title | Tracing Sub-Structure in the European American Population with PCA-Informative Markers |
title_full | Tracing Sub-Structure in the European American Population with PCA-Informative Markers |
title_fullStr | Tracing Sub-Structure in the European American Population with PCA-Informative Markers |
title_full_unstemmed | Tracing Sub-Structure in the European American Population with PCA-Informative Markers |
title_short | Tracing Sub-Structure in the European American Population with PCA-Informative Markers |
title_sort | tracing sub-structure in the european american population with pca-informative markers |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537989/ https://www.ncbi.nlm.nih.gov/pubmed/18797516 http://dx.doi.org/10.1371/journal.pgen.1000114 |
work_keys_str_mv | AT paschouperistera tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT drineaspetros tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT lewisjamey tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT nievergeltcarolinem tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT nickersondeboraha tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT smithjoshuad tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT ridkerpaulm tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT chasmandanieli tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT kraussronaldm tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers AT zivelad tracingsubstructureintheeuropeanamericanpopulationwithpcainformativemarkers |