Cargando…

Clustering by genetic ancestry using genome-wide SNP data

BACKGROUND: Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest....

Descripción completa

Detalles Bibliográficos
Autores principales: Solovieff, Nadia, Hartley, Stephen W, Baldwin, Clinton T, Perls, Thomas T, Steinberg, Martin H, Sebastiani, Paola
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3018397/
https://www.ncbi.nlm.nih.gov/pubmed/21143920
http://dx.doi.org/10.1186/1471-2156-11-108
_version_ 1782196056362057728
author Solovieff, Nadia
Hartley, Stephen W
Baldwin, Clinton T
Perls, Thomas T
Steinberg, Martin H
Sebastiani, Paola
author_facet Solovieff, Nadia
Hartley, Stephen W
Baldwin, Clinton T
Perls, Thomas T
Steinberg, Martin H
Sebastiani, Paola
author_sort Solovieff, Nadia
collection PubMed
description BACKGROUND: Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls. RESULTS: We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations. CONCLUSIONS: In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects.
format Text
id pubmed-3018397
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30183972011-01-24 Clustering by genetic ancestry using genome-wide SNP data Solovieff, Nadia Hartley, Stephen W Baldwin, Clinton T Perls, Thomas T Steinberg, Martin H Sebastiani, Paola BMC Genet Methodology Article BACKGROUND: Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls. RESULTS: We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations. CONCLUSIONS: In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects. BioMed Central 2010-12-09 /pmc/articles/PMC3018397/ /pubmed/21143920 http://dx.doi.org/10.1186/1471-2156-11-108 Text en Copyright ©2010 Solovieff et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Solovieff, Nadia
Hartley, Stephen W
Baldwin, Clinton T
Perls, Thomas T
Steinberg, Martin H
Sebastiani, Paola
Clustering by genetic ancestry using genome-wide SNP data
title Clustering by genetic ancestry using genome-wide SNP data
title_full Clustering by genetic ancestry using genome-wide SNP data
title_fullStr Clustering by genetic ancestry using genome-wide SNP data
title_full_unstemmed Clustering by genetic ancestry using genome-wide SNP data
title_short Clustering by genetic ancestry using genome-wide SNP data
title_sort clustering by genetic ancestry using genome-wide snp data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3018397/
https://www.ncbi.nlm.nih.gov/pubmed/21143920
http://dx.doi.org/10.1186/1471-2156-11-108
work_keys_str_mv AT solovieffnadia clusteringbygeneticancestryusinggenomewidesnpdata
AT hartleystephenw clusteringbygeneticancestryusinggenomewidesnpdata
AT baldwinclintont clusteringbygeneticancestryusinggenomewidesnpdata
AT perlsthomast clusteringbygeneticancestryusinggenomewidesnpdata
AT steinbergmartinh clusteringbygeneticancestryusinggenomewidesnpdata
AT sebastianipaola clusteringbygeneticancestryusinggenomewidesnpdata