Cargando…
Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6166949/ https://www.ncbi.nlm.nih.gov/pubmed/30273405 http://dx.doi.org/10.1371/journal.pone.0204897 |
_version_ | 1783360116409499648 |
---|---|
author | Collignon, Olivier Han, Jeongseop An, Hyungmi Oh, Seungyoung Lee, Youngjo |
author_facet | Collignon, Olivier Han, Jeongseop An, Hyungmi Oh, Seungyoung Lee, Youngjo |
author_sort | Collignon, Olivier |
collection | PubMed |
description | Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by the L1 norm of the regression coefficients, has become the gold-standard to reach these objectives. Recently Lee and Oh developed a novel random-effect covariate selection method called the modified unbounded penalty (MUB) regression, whose penalization function can equal minus infinity at 0 in order to produce very sparse models. We sought to compare the predictive accuracy and the number of covariates selected by these two methods in several high-dimensional datasets, consisting in genes expressions measured to predict response to chemotherapy in breast cancer patients. These comparisons were performed by building the Receiver Operating Characteristics (ROC) curves of the classifiers obtained with the selected genes and by comparing their area under the ROC curve (AUC) corrected for optimism using several variants of bootstrap internal validation and cross-validation. We found consistently in all datasets that the MUB penalization selected a remarkably smaller number of covariates than the LASSO while offering a similar—and encouraging—predictive accuracy. The models selected by the MUB were actually nested in the ones obtained with the LASSO. Similar findings were observed when comparing these results to those obtained in their first publication by other authors or when using the area under the Precision-Recall curve (AUCPR) as another measure of predictive performance. In conclusion, the MUB penalization seems therefore to be one of the best options when sparsity is required in high-dimension. Further investigation in other datasets is however required to validate these findings. |
format | Online Article Text |
id | pubmed-6166949 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-61669492018-10-19 Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer Collignon, Olivier Han, Jeongseop An, Hyungmi Oh, Seungyoung Lee, Youngjo PLoS One Research Article Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by the L1 norm of the regression coefficients, has become the gold-standard to reach these objectives. Recently Lee and Oh developed a novel random-effect covariate selection method called the modified unbounded penalty (MUB) regression, whose penalization function can equal minus infinity at 0 in order to produce very sparse models. We sought to compare the predictive accuracy and the number of covariates selected by these two methods in several high-dimensional datasets, consisting in genes expressions measured to predict response to chemotherapy in breast cancer patients. These comparisons were performed by building the Receiver Operating Characteristics (ROC) curves of the classifiers obtained with the selected genes and by comparing their area under the ROC curve (AUC) corrected for optimism using several variants of bootstrap internal validation and cross-validation. We found consistently in all datasets that the MUB penalization selected a remarkably smaller number of covariates than the LASSO while offering a similar—and encouraging—predictive accuracy. The models selected by the MUB were actually nested in the ones obtained with the LASSO. Similar findings were observed when comparing these results to those obtained in their first publication by other authors or when using the area under the Precision-Recall curve (AUCPR) as another measure of predictive performance. In conclusion, the MUB penalization seems therefore to be one of the best options when sparsity is required in high-dimension. Further investigation in other datasets is however required to validate these findings. Public Library of Science 2018-10-01 /pmc/articles/PMC6166949/ /pubmed/30273405 http://dx.doi.org/10.1371/journal.pone.0204897 Text en © 2018 Collignon et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Collignon, Olivier Han, Jeongseop An, Hyungmi Oh, Seungyoung Lee, Youngjo Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer |
title | Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer |
title_full | Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer |
title_fullStr | Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer |
title_full_unstemmed | Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer |
title_short | Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer |
title_sort | comparison of the modified unbounded penalty and the lasso to select predictive genes of response to chemotherapy in breast cancer |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6166949/ https://www.ncbi.nlm.nih.gov/pubmed/30273405 http://dx.doi.org/10.1371/journal.pone.0204897 |
work_keys_str_mv | AT collignonolivier comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer AT hanjeongseop comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer AT anhyungmi comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer AT ohseungyoung comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer AT leeyoungjo comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer |