Cargando…
EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
Motivation: We developed an EM-random forest (EMRF) for Haseman–Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638262/ https://www.ncbi.nlm.nih.gov/pubmed/18499695 http://dx.doi.org/10.1093/bioinformatics/btn239 |
_version_ | 1782164405708914688 |
---|---|
author | Lee, Sophia S. F. Sun, Lei Kustra, Rafal Bull, Shelley B. |
author_facet | Lee, Sophia S. F. Sun, Lei Kustra, Rafal Bull, Shelley B. |
author_sort | Lee, Sophia S. F. |
collection | PubMed |
description | Motivation: We developed an EM-random forest (EMRF) for Haseman–Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data. Results: Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman–Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs. Availability: The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF Contact: bull@mshri.on.ca Supplementary information: Supplementary data are available at www.infornomics.utoronto.ca/downloads/EMRF |
format | Text |
id | pubmed-2638262 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-26382622009-02-25 EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis Lee, Sophia S. F. Sun, Lei Kustra, Rafal Bull, Shelley B. Bioinformatics Original Papers Motivation: We developed an EM-random forest (EMRF) for Haseman–Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data. Results: Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman–Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs. Availability: The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF Contact: bull@mshri.on.ca Supplementary information: Supplementary data are available at www.infornomics.utoronto.ca/downloads/EMRF Oxford University Press 2008-07-15 2008-05-21 /pmc/articles/PMC2638262/ /pubmed/18499695 http://dx.doi.org/10.1093/bioinformatics/btn239 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Lee, Sophia S. F. Sun, Lei Kustra, Rafal Bull, Shelley B. EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
title | EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
title_full | EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
title_fullStr | EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
title_full_unstemmed | EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
title_short | EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
title_sort | em-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638262/ https://www.ncbi.nlm.nih.gov/pubmed/18499695 http://dx.doi.org/10.1093/bioinformatics/btn239 |
work_keys_str_mv | AT leesophiasf emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis AT sunlei emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis AT kustrarafal emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis AT bullshelleyb emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis |