Cargando…

Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classica...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Min, Zhang, Dabao, Wells, Martin T
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435550/ https://www.ncbi.nlm.nih.gov/pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251

_version_	1782156493184827392
author	Zhang, Min Zhang, Dabao Wells, Martin T
author_facet	Zhang, Min Zhang, Dabao Wells, Martin T
author_sort	Zhang, Min
collection	PubMed
description	BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.
format	Text
id	pubmed-2435550
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24355502008-06-24 Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases Zhang, Min Zhang, Dabao Wells, Martin T BMC Bioinformatics Methodology Article BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size. BioMed Central 2008-05-29 /pmc/articles/PMC2435550/ /pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251 Text en Copyright © 2008 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Zhang, Min Zhang, Dabao Wells, Martin T Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title	Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_full	Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_fullStr	Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_full_unstemmed	Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_short	Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
title_sort	variable selection for large p small n regression models with incomplete data: mapping qtl with epistases
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435550/ https://www.ncbi.nlm.nih.gov/pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251
work_keys_str_mv	AT zhangmin variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases AT zhangdabao variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases AT wellsmartint variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases

Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

Ejemplares similares