Cargando…
Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases
BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classica...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435550/ https://www.ncbi.nlm.nih.gov/pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251 |
_version_ | 1782156493184827392 |
---|---|
author | Zhang, Min Zhang, Dabao Wells, Martin T |
author_facet | Zhang, Min Zhang, Dabao Wells, Martin T |
author_sort | Zhang, Min |
collection | PubMed |
description | BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size. |
format | Text |
id | pubmed-2435550 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-24355502008-06-24 Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases Zhang, Min Zhang, Dabao Wells, Martin T BMC Bioinformatics Methodology Article BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size. BioMed Central 2008-05-29 /pmc/articles/PMC2435550/ /pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251 Text en Copyright © 2008 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Zhang, Min Zhang, Dabao Wells, Martin T Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases |
title | Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases |
title_full | Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases |
title_fullStr | Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases |
title_full_unstemmed | Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases |
title_short | Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases |
title_sort | variable selection for large p small n regression models with incomplete data: mapping qtl with epistases |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435550/ https://www.ncbi.nlm.nih.gov/pubmed/18510743 http://dx.doi.org/10.1186/1471-2105-9-251 |
work_keys_str_mv | AT zhangmin variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases AT zhangdabao variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases AT wellsmartint variableselectionforlargepsmallnregressionmodelswithincompletedatamappingqtlwithepistases |