Cargando…

Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset

In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be deriv...

Descripción completa

Detalles Bibliográficos
Autores principales: Arshadi, Niloofar, Chang, Billy, Kustra, Rafal
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795961/
https://www.ncbi.nlm.nih.gov/pubmed/20018054
_version_ 1782175480240144384
author Arshadi, Niloofar
Chang, Billy
Kustra, Rafal
author_facet Arshadi, Niloofar
Chang, Billy
Kustra, Rafal
author_sort Arshadi, Niloofar
collection PubMed
description In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects.
format Text
id pubmed-2795961
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27959612009-12-18 Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset Arshadi, Niloofar Chang, Billy Kustra, Rafal BMC Proc Proceedings In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects. BioMed Central 2009-12-15 /pmc/articles/PMC2795961/ /pubmed/20018054 Text en Copyright ©2009 Arshadi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Arshadi, Niloofar
Chang, Billy
Kustra, Rafal
Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
title Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
title_full Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
title_fullStr Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
title_full_unstemmed Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
title_short Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
title_sort predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using genetic analysis workshop 16 problem 1 dataset
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795961/
https://www.ncbi.nlm.nih.gov/pubmed/20018054
work_keys_str_mv AT arshadiniloofar predictivemodelingincasecontrolsinglenucleotidepolymorphismstudiesinthepresenceofpopulationstratificationacasestudyusinggeneticanalysisworkshop16problem1dataset
AT changbilly predictivemodelingincasecontrolsinglenucleotidepolymorphismstudiesinthepresenceofpopulationstratificationacasestudyusinggeneticanalysisworkshop16problem1dataset
AT kustrarafal predictivemodelingincasecontrolsinglenucleotidepolymorphismstudiesinthepresenceofpopulationstratificationacasestudyusinggeneticanalysisworkshop16problem1dataset