Cargando…

Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting

New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chaibub Neto, Elias, Bare, J. Christopher, Margolin, Adam A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188526/ https://www.ncbi.nlm.nih.gov/pubmed/25289666 http://dx.doi.org/10.1371/journal.pone.0107957

_version_	1782338242029289472
author	Chaibub Neto, Elias Bare, J. Christopher Margolin, Adam A.
author_facet	Chaibub Neto, Elias Bare, J. Christopher Margolin, Adam A.
author_sort	Chaibub Neto, Elias
collection	PubMed
description	New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights.
format	Online Article Text
id	pubmed-4188526
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-41885262014-10-10 Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting Chaibub Neto, Elias Bare, J. Christopher Margolin, Adam A. PLoS One Research Article New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights. Public Library of Science 2014-10-07 /pmc/articles/PMC4188526/ /pubmed/25289666 http://dx.doi.org/10.1371/journal.pone.0107957 Text en © 2014 Chaibub Neto et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Chaibub Neto, Elias Bare, J. Christopher Margolin, Adam A. Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting
title	Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting
title_full	Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting
title_fullStr	Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting
title_full_unstemmed	Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting
title_short	Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting
title_sort	simulation studies as designed experiments: the comparison of penalized regression models in the “large p, small n” setting
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188526/ https://www.ncbi.nlm.nih.gov/pubmed/25289666 http://dx.doi.org/10.1371/journal.pone.0107957
work_keys_str_mv	AT chaibubnetoelias simulationstudiesasdesignedexperimentsthecomparisonofpenalizedregressionmodelsinthelargepsmallnsetting AT barejchristopher simulationstudiesasdesignedexperimentsthecomparisonofpenalizedregressionmodelsinthelargepsmallnsetting AT margolinadama simulationstudiesasdesignedexperimentsthecomparisonofpenalizedregressionmodelsinthelargepsmallnsetting

Simulation Studies as Designed Experiments: The Comparison of Penalized Regression Models in the “Large p, Small n” Setting

Ejemplares similares