Cargando…

Selection of important variables by statistical learning in genome-wide association analysis

Genetic analysis of complex diseases demands novel analytical methods to interpret data collected on thousands of variables by genome-wide association studies. The complexity of such analysis is multiplied when one has to consider interaction effects, be they among the genetic variations (G × G) or...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Wei (Will), Gu, C Charles
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795972/
https://www.ncbi.nlm.nih.gov/pubmed/20018065
_version_ 1782175482831175680
author Yang, Wei (Will)
Gu, C Charles
author_facet Yang, Wei (Will)
Gu, C Charles
author_sort Yang, Wei (Will)
collection PubMed
description Genetic analysis of complex diseases demands novel analytical methods to interpret data collected on thousands of variables by genome-wide association studies. The complexity of such analysis is multiplied when one has to consider interaction effects, be they among the genetic variations (G × G) or with environment risk factors (G × E). Several statistical learning methods seem quite promising in this context. Herein we consider applications of two such methods, random forest and Bayesian networks, to the simulated dataset for Genetic Analysis Workshop 16 Problem 3. Our evaluation study showed that an iterative search based on the random forest approach has the potential in selecting important variables, while Bayesian networks can capture some of the underlying causal relationships.
format Text
id pubmed-2795972
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27959722009-12-18 Selection of important variables by statistical learning in genome-wide association analysis Yang, Wei (Will) Gu, C Charles BMC Proc Proceedings Genetic analysis of complex diseases demands novel analytical methods to interpret data collected on thousands of variables by genome-wide association studies. The complexity of such analysis is multiplied when one has to consider interaction effects, be they among the genetic variations (G × G) or with environment risk factors (G × E). Several statistical learning methods seem quite promising in this context. Herein we consider applications of two such methods, random forest and Bayesian networks, to the simulated dataset for Genetic Analysis Workshop 16 Problem 3. Our evaluation study showed that an iterative search based on the random forest approach has the potential in selecting important variables, while Bayesian networks can capture some of the underlying causal relationships. BioMed Central 2009-12-15 /pmc/articles/PMC2795972/ /pubmed/20018065 Text en Copyright ©2009 Yang and Gu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Yang, Wei (Will)
Gu, C Charles
Selection of important variables by statistical learning in genome-wide association analysis
title Selection of important variables by statistical learning in genome-wide association analysis
title_full Selection of important variables by statistical learning in genome-wide association analysis
title_fullStr Selection of important variables by statistical learning in genome-wide association analysis
title_full_unstemmed Selection of important variables by statistical learning in genome-wide association analysis
title_short Selection of important variables by statistical learning in genome-wide association analysis
title_sort selection of important variables by statistical learning in genome-wide association analysis
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795972/
https://www.ncbi.nlm.nih.gov/pubmed/20018065
work_keys_str_mv AT yangweiwill selectionofimportantvariablesbystatisticallearningingenomewideassociationanalysis
AT guccharles selectionofimportantvariablesbystatisticallearningingenomewideassociationanalysis