Cargando…

Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests

Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent...

Descripción completa

Detalles Bibliográficos
Autores principales: Glaser, Beate, Nikolov, Ivan, Chubb, Daniel, Hamshere, Marian L, Segurado, Ricardo, Moskvina, Valentina, Holmans, Peter
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367457/
https://www.ncbi.nlm.nih.gov/pubmed/18466554
_version_ 1782154296123457536
author Glaser, Beate
Nikolov, Ivan
Chubb, Daniel
Hamshere, Marian L
Segurado, Ricardo
Moskvina, Valentina
Holmans, Peter
author_facet Glaser, Beate
Nikolov, Ivan
Chubb, Daniel
Hamshere, Marian L
Segurado, Ricardo
Moskvina, Valentina
Holmans, Peter
author_sort Glaser, Beate
collection PubMed
description Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels. Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction. This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies. Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations.
format Text
id pubmed-2367457
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23674572008-05-06 Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests Glaser, Beate Nikolov, Ivan Chubb, Daniel Hamshere, Marian L Segurado, Ricardo Moskvina, Valentina Holmans, Peter BMC Proc Proceedings Using parametric and nonparametric techniques, our study investigated the presence of single locus and pairwise effects between 20 markers of the Genetic Analysis Workshop 15 (GAW15) North American Rheumatoid Arthritis Consortium (NARAC) candidate gene data set (Problem 2), analyzing 463 independent patients and 855 controls. Specifically, our work examined the correspondence between logistic regression (LR) analysis of single-locus and pairwise interaction effects, and random forest (RF) single and joint importance measures. For this comparison, we selected small but stable RFs (500 trees), which showed strong correlations (r~0.98) between their importance measures and those by RFs grown on 5000 trees. Both RF importance measures captured most of the LR single-locus and pairwise interaction effects, while joint importance measures also corresponded to full LR models containing main and interaction effects. We furthermore showed that RF measures were particularly sensitive to data imputation. The most consistent pairwise effect on rheumatoid arthritis was found between two markers within MAP3K7IP2/SUMO4 on 6q25.1, although LR and RFs assigned different significance levels. Within a hypothetical two-stage design, pairwise LR analysis of all markers with significant RF single importance would have reduced the number of possible combinations in our small data set by 61%, whereas joint importance measures would have been less efficient for marker pair reduction. This suggests that RF single importance measures, which are able to detect a wide range of interaction effects and are computationally very efficient, might be exploited as pre-screening tool for larger association studies. Follow-up analysis, such as by LR, is required since RFs do not indicate high-risk genotype combinations. BioMed Central 2007-12-18 /pmc/articles/PMC2367457/ /pubmed/18466554 Text en Copyright © 2007 Glaser et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Glaser, Beate
Nikolov, Ivan
Chubb, Daniel
Hamshere, Marian L
Segurado, Ricardo
Moskvina, Valentina
Holmans, Peter
Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
title Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
title_full Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
title_fullStr Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
title_full_unstemmed Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
title_short Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
title_sort analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367457/
https://www.ncbi.nlm.nih.gov/pubmed/18466554
work_keys_str_mv AT glaserbeate analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests
AT nikolovivan analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests
AT chubbdaniel analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests
AT hamsheremarianl analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests
AT seguradoricardo analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests
AT moskvinavalentina analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests
AT holmanspeter analysesofsinglemarkerandpairwiseeffectsofcandidatelociforrheumatoidarthritisusinglogisticregressionandrandomforests