Cargando…

Systematic Removal of Outliers to Reduce Heterogeneity in Case-Control Association Studies

BACKGROUND/AIMS: In human case-control association studies, population heterogeneity is often present and can lead to increased false-positive results. Various methods have been proposed and are in current use to remedy this situation. METHODS: We assume that heterogeneity is due to a relatively sma...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Yuanyuan, Liu, Zhe, Ott, Jurg
Formato: Texto
Lenguaje:English
Publicado: S. Karger AG 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975732/
https://www.ncbi.nlm.nih.gov/pubmed/20924194
http://dx.doi.org/10.1159/000320422
Descripción
Sumario:BACKGROUND/AIMS: In human case-control association studies, population heterogeneity is often present and can lead to increased false-positive results. Various methods have been proposed and are in current use to remedy this situation. METHODS: We assume that heterogeneity is due to a relatively small number of individuals whose allele frequencies differ from those of the remainder of the sample. For this situation, we propose a new method of handling heterogeneity by removing outliers in a controlled manner. In a coordinate system of the c largest principal components in multidimensional scaling (MDS), we systematically remove one after another of the most extreme outlying individuals and each time recompute the largest association test statistic. The smallest p value obtained within M removals serves as our test statistic whose significance level is assessed in randomization samples. RESULTS: In power simulations of our method and three methods in current use, averaged over several different scenarios, the best method turned out to be logistic regression analysis (based on all individuals) with MDS components as covariates. CONCLUSION: Our proposed method ranked closely behind logistic regression analysis with MDS components but ahead of other commonly used approaches. In analyses of real datasets our method performed best.