Cargando…

Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classica...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Min, Zhang, Dabao, Wells, Martin T
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435550/
https://www.ncbi.nlm.nih.gov/pubmed/18510743
http://dx.doi.org/10.1186/1471-2105-9-251
_version_ 1782156493184827392
author Zhang, Min
Zhang, Dabao
Wells, Martin T
author_facet Zhang, Min
Zhang, Dabao
Wells, Martin T
author_sort Zhang, Min
collection PubMed
description BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.
format Text
id pubmed-2435550
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24355502008-06-24 Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases Zhang, Min Zhang, Dabao Wells, Martin T BMC Bioinformatics Methodology Article BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size. BioMed Central 2008-05-29 /pmc/articles/PMC2435550/ /pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251 Text en Copyright © 2008 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Zhang, Min
Zhang, Dabao
Wells, Martin T
Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_full Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_fullStr Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_full_unstemmed Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_short Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_sort variable selection for large p small n regression models with incomplete data: mapping qtl with epistases
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435550/
https://www.ncbi.nlm.nih.gov/pubmed/18510743
http://dx.doi.org/10.1186/1471-2105-9-251
work_keys_str_mv AT zhangmin variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases
AT zhangdabao variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases
AT wellsmartint variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases