Cargando…
Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests
Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795970/ https://www.ncbi.nlm.nih.gov/pubmed/20018063 |
_version_ | 1782175482334150656 |
---|---|
author | Wang, Minghui Chen, Xiang Zhang, Meizhuo Zhu, Wensheng Cho, Kelly Zhang, Heping |
author_facet | Wang, Minghui Chen, Xiang Zhang, Meizhuo Zhu, Wensheng Cho, Kelly Zhang, Heping |
author_sort | Wang, Minghui |
collection | PubMed |
description | Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained from gene expression/single-nucleotide polymorphism (SNP) array experiments in which the number of candidate genes/SNPs is vast. We applied this approach to the Genetic Analysis Workshop 16 Problem 1 data to identify SNPs that contribute to rheumatoid arthritis. The random forest computed a raw importance score for each SNP marker, where higher importance score suggests higher level of association between the marker and the trait. The significance level of the association was determined empirically by repeatedly reapplying the random forest on randomly generated data under the null hypothesis that no association exists between the markers and the trait. Using random forest, we were able to identify 228 significant SNPs (at the genome-wide significant level of 0.05) across the whole genome, over two-thirds of which are located on chromosome 6, especially clustered in the region of 6p21 containing the human leukocyte antigen (HLA) genes, such as gene HLA-DRB1 and HLA-DRA. Further analysis of this region indicates a strong association to the rheumatoid arthritis status. |
format | Text |
id | pubmed-2795970 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27959702009-12-18 Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests Wang, Minghui Chen, Xiang Zhang, Meizhuo Zhu, Wensheng Cho, Kelly Zhang, Heping BMC Proc Proceedings Random forest is an efficient approach for investigating not only the effects of individual markers on a trait but also the effect of the interactions among the markers in genetic association studies. This approach is especially appealing for the analysis of genome-wide data, such as those obtained from gene expression/single-nucleotide polymorphism (SNP) array experiments in which the number of candidate genes/SNPs is vast. We applied this approach to the Genetic Analysis Workshop 16 Problem 1 data to identify SNPs that contribute to rheumatoid arthritis. The random forest computed a raw importance score for each SNP marker, where higher importance score suggests higher level of association between the marker and the trait. The significance level of the association was determined empirically by repeatedly reapplying the random forest on randomly generated data under the null hypothesis that no association exists between the markers and the trait. Using random forest, we were able to identify 228 significant SNPs (at the genome-wide significant level of 0.05) across the whole genome, over two-thirds of which are located on chromosome 6, especially clustered in the region of 6p21 containing the human leukocyte antigen (HLA) genes, such as gene HLA-DRB1 and HLA-DRA. Further analysis of this region indicates a strong association to the rheumatoid arthritis status. BioMed Central 2009-12-15 /pmc/articles/PMC2795970/ /pubmed/20018063 Text en Copyright ©2009 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Wang, Minghui Chen, Xiang Zhang, Meizhuo Zhu, Wensheng Cho, Kelly Zhang, Heping Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
title | Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
title_full | Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
title_fullStr | Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
title_full_unstemmed | Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
title_short | Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
title_sort | detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795970/ https://www.ncbi.nlm.nih.gov/pubmed/20018063 |
work_keys_str_mv | AT wangminghui detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests AT chenxiang detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests AT zhangmeizhuo detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests AT zhuwensheng detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests AT chokelly detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests AT zhangheping detectingsignificantsinglenucleotidepolymorphismsinarheumatoidarthritisstudyusingrandomforests |