Cargando…
Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset
In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be deriv...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795961/ https://www.ncbi.nlm.nih.gov/pubmed/20018054 |
_version_ | 1782175480240144384 |
---|---|
author | Arshadi, Niloofar Chang, Billy Kustra, Rafal |
author_facet | Arshadi, Niloofar Chang, Billy Kustra, Rafal |
author_sort | Arshadi, Niloofar |
collection | PubMed |
description | In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects. |
format | Text |
id | pubmed-2795961 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27959612009-12-18 Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset Arshadi, Niloofar Chang, Billy Kustra, Rafal BMC Proc Proceedings In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects. BioMed Central 2009-12-15 /pmc/articles/PMC2795961/ /pubmed/20018054 Text en Copyright ©2009 Arshadi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Arshadi, Niloofar Chang, Billy Kustra, Rafal Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset |
title | Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset |
title_full | Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset |
title_fullStr | Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset |
title_full_unstemmed | Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset |
title_short | Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset |
title_sort | predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using genetic analysis workshop 16 problem 1 dataset |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795961/ https://www.ncbi.nlm.nih.gov/pubmed/20018054 |
work_keys_str_mv | AT arshadiniloofar predictivemodelingincasecontrolsinglenucleotidepolymorphismstudiesinthepresenceofpopulationstratificationacasestudyusinggeneticanalysisworkshop16problem1dataset AT changbilly predictivemodelingincasecontrolsinglenucleotidepolymorphismstudiesinthepresenceofpopulationstratificationacasestudyusinggeneticanalysisworkshop16problem1dataset AT kustrarafal predictivemodelingincasecontrolsinglenucleotidepolymorphismstudiesinthepresenceofpopulationstratificationacasestudyusinggeneticanalysisworkshop16problem1dataset |