Cargando…

EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis

Motivation: We developed an EM-random forest (EMRF) for Haseman–Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sophia S. F., Sun, Lei, Kustra, Rafal, Bull, Shelley B.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638262/
https://www.ncbi.nlm.nih.gov/pubmed/18499695
http://dx.doi.org/10.1093/bioinformatics/btn239
_version_ 1782164405708914688
author Lee, Sophia S. F.
Sun, Lei
Kustra, Rafal
Bull, Shelley B.
author_facet Lee, Sophia S. F.
Sun, Lei
Kustra, Rafal
Bull, Shelley B.
author_sort Lee, Sophia S. F.
collection PubMed
description Motivation: We developed an EM-random forest (EMRF) for Haseman–Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data. Results: Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman–Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs. Availability: The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF Contact: bull@mshri.on.ca Supplementary information: Supplementary data are available at www.infornomics.utoronto.ca/downloads/EMRF
format Text
id pubmed-2638262
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-26382622009-02-25 EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis Lee, Sophia S. F. Sun, Lei Kustra, Rafal Bull, Shelley B. Bioinformatics Original Papers Motivation: We developed an EM-random forest (EMRF) for Haseman–Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of correlation between markers. We define new VI indices that borrow information from linked markers using the correlation structure inherent in IBD linkage data. Results: Using simulations, we find that the new VI indices in EMRF performed better than the original RF VI index and performed similarly or better than EM-Haseman–Elston regression LOD score for various genetic models. Moreover, tree size and markers subset size evaluated at each node are important considerations in RFs. Availability: The source code for EMRF written in C is available at www.infornomics.utoronto.ca/downloads/EMRF Contact: bull@mshri.on.ca Supplementary information: Supplementary data are available at www.infornomics.utoronto.ca/downloads/EMRF Oxford University Press 2008-07-15 2008-05-21 /pmc/articles/PMC2638262/ /pubmed/18499695 http://dx.doi.org/10.1093/bioinformatics/btn239 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Lee, Sophia S. F.
Sun, Lei
Kustra, Rafal
Bull, Shelley B.
EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
title EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
title_full EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
title_fullStr EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
title_full_unstemmed EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
title_short EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
title_sort em-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638262/
https://www.ncbi.nlm.nih.gov/pubmed/18499695
http://dx.doi.org/10.1093/bioinformatics/btn239
work_keys_str_mv AT leesophiasf emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis
AT sunlei emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis
AT kustrarafal emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis
AT bullshelleyb emrandomforestandnewmeasuresofvariableimportanceformultilocusquantitativetraitlinkageanalysis