Cargando…

Choice of population structure informative principal components for adjustment in a case-control study

BACKGROUND: There are many ways to perform adjustment for population structure. It remains unclear what the optimal approach is and whether the optimal approach varies by the type of samples and substructure present. The simplest and most straightforward approach is to adjust for the continuous prin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Peloso, Gina M, Lunetta, Kathryn L
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3150322/ https://www.ncbi.nlm.nih.gov/pubmed/21771328 http://dx.doi.org/10.1186/1471-2156-12-64

_version_	1782209531923660800
author	Peloso, Gina M Lunetta, Kathryn L
author_facet	Peloso, Gina M Lunetta, Kathryn L
author_sort	Peloso, Gina M
collection	PubMed
description	BACKGROUND: There are many ways to perform adjustment for population structure. It remains unclear what the optimal approach is and whether the optimal approach varies by the type of samples and substructure present. The simplest and most straightforward approach is to adjust for the continuous principal components (PCs) that capture ancestry. Through simulation, we explored the issue of which ancestry informative PCs should be adjusted for in an association model to control for the confounding nature of population structure while maintaining maximum power. A thorough examination of selecting PCs for adjustment in a case-control study across the possible structure scenarios that could occur in a genome-wide association study has not been previously reported. RESULTS: We found that when the SNP and phenotype frequencies do not vary over the sub-populations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates generally provide the most power. When there is a structured SNP and a non-structured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power. CONCLUSIONS: Standard practice is to include a fixed number of PCs in genome-wide association studies. Based on our findings, we conclude that if power is not a concern, then selecting the same set of top PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model.
format	Online Article Text
id	pubmed-3150322
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31503222011-08-05 Choice of population structure informative principal components for adjustment in a case-control study Peloso, Gina M Lunetta, Kathryn L BMC Genet Research Article BACKGROUND: There are many ways to perform adjustment for population structure. It remains unclear what the optimal approach is and whether the optimal approach varies by the type of samples and substructure present. The simplest and most straightforward approach is to adjust for the continuous principal components (PCs) that capture ancestry. Through simulation, we explored the issue of which ancestry informative PCs should be adjusted for in an association model to control for the confounding nature of population structure while maintaining maximum power. A thorough examination of selecting PCs for adjustment in a case-control study across the possible structure scenarios that could occur in a genome-wide association study has not been previously reported. RESULTS: We found that when the SNP and phenotype frequencies do not vary over the sub-populations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates generally provide the most power. When there is a structured SNP and a non-structured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power. CONCLUSIONS: Standard practice is to include a fixed number of PCs in genome-wide association studies. Based on our findings, we conclude that if power is not a concern, then selecting the same set of top PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model. BioMed Central 2011-07-19 /pmc/articles/PMC3150322/ /pubmed/21771328 http://dx.doi.org/10.1186/1471-2156-12-64 Text en Copyright ©2011 Peloso and Lunetta; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Peloso, Gina M Lunetta, Kathryn L Choice of population structure informative principal components for adjustment in a case-control study
title	Choice of population structure informative principal components for adjustment in a case-control study
title_full	Choice of population structure informative principal components for adjustment in a case-control study
title_fullStr	Choice of population structure informative principal components for adjustment in a case-control study
title_full_unstemmed	Choice of population structure informative principal components for adjustment in a case-control study
title_short	Choice of population structure informative principal components for adjustment in a case-control study
title_sort	choice of population structure informative principal components for adjustment in a case-control study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3150322/ https://www.ncbi.nlm.nih.gov/pubmed/21771328 http://dx.doi.org/10.1186/1471-2156-12-64
work_keys_str_mv	AT pelosoginam choiceofpopulationstructureinformativeprincipalcomponentsforadjustmentinacasecontrolstudy AT lunettakathrynl choiceofpopulationstructureinformativeprincipalcomponentsforadjustmentinacasecontrolstudy

Choice of population structure informative principal components for adjustment in a case-control study

Ejemplares similares