Cargando…

Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests

Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Minghui, Chen, Xiang, Zhang, Meizhuo, Zhu, Wensheng, Cho, Kelly, Zhang, Heping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795970/
https://www.ncbi.nlm.nih.gov/pubmed/20018063
_version_ 1782175482334150656
author Wang, Minghui
Chen, Xiang
Zhang, Meizhuo
Zhu, Wensheng
Cho, Kelly
Zhang, Heping
author_facet Wang, Minghui
Chen, Xiang
Zhang, Meizhuo
Zhu, Wensheng
Cho, Kelly
Zhang, Heping
author_sort Wang, Minghui
collection PubMed
description Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained from gene expression/single-nucleotide polymorphism (SNP) array experiments in which the number of candidate genes/SNPs is vast. We applied this approach to the Genetic Analysis Workshop 16 Problem 1 data to identify SNPs that contribute to rheumatoid arthritis. The random forest computed a raw importance score for each SNP marker, where higher importance score suggests higher level of association between the marker and the trait. The significance level of the association was determined empirically by repeatedly reapplying the random forest on randomly generated data under the null hypothesis that no association exists between the markers and the trait. Using random forest, we were able to identify 228 significant SNPs (at the genome-wide significant level of 0.05) across the whole genome, over two-thirds of which are located on chromosome 6, especially clustered in the region of 6p21 containing the human leukocyte antigen (HLA) genes, such as gene HLA-DRB1 and HLA-DRA. Further analysis of this region indicates a strong association to the rheumatoid arthritis status.
format Text
id pubmed-2795970
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27959702009-12-18 Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests Wang, Minghui Chen, Xiang Zhang, Meizhuo Zhu, Wensheng Cho, Kelly Zhang, Heping BMC Proc Proceedings Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained from gene expression/single-nucleotide polymorphism (SNP) array experiments in which the number of candidate genes/SNPs is vast. We applied this approach to the Genetic Analysis Workshop 16 Problem 1 data to identify SNPs that contribute to rheumatoid arthritis. The random forest computed a raw importance score for each SNP marker, where higher importance score suggests higher level of association between the marker and the trait. The significance level of the association was determined empirically by repeatedly reapplying the random forest on randomly generated data under the null hypothesis that no association exists between the markers and the trait. Using random forest, we were able to identify 228 significant SNPs (at the genome-wide significant level of 0.05) across the whole genome, over two-thirds of which are located on chromosome 6, especially clustered in the region of 6p21 containing the human leukocyte antigen (HLA) genes, such as gene HLA-DRB1 and HLA-DRA. Further analysis of this region indicates a strong association to the rheumatoid arthritis status. BioMed Central 2009-12-15 /pmc/articles/PMC2795970/ /pubmed/20018063 Text en Copyright ©2009 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Wang, Minghui
Chen, Xiang
Zhang, Meizhuo
Zhu, Wensheng
Cho, Kelly
Zhang, Heping
Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
title Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
title_full Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
title_fullStr Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
title_full_unstemmed Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
title_short Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
title_sort detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795970/
https://www.ncbi.nlm.nih.gov/pubmed/20018063
work_keys_str_mv AT wangminghui detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests
AT chenxiang detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests
AT zhangmeizhuo detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests
AT zhuwensheng detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests
AT chokelly detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests
AT zhangheping detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests