Cargando…

Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer

Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by...

Descripción completa

Detalles Bibliográficos
Autores principales: Collignon, Olivier, Han, Jeongseop, An, Hyungmi, Oh, Seungyoung, Lee, Youngjo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6166949/
https://www.ncbi.nlm.nih.gov/pubmed/30273405
http://dx.doi.org/10.1371/journal.pone.0204897
_version_ 1783360116409499648
author Collignon, Olivier
Han, Jeongseop
An, Hyungmi
Oh, Seungyoung
Lee, Youngjo
author_facet Collignon, Olivier
Han, Jeongseop
An, Hyungmi
Oh, Seungyoung
Lee, Youngjo
author_sort Collignon, Olivier
collection PubMed
description Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by the L1 norm of the regression coefficients, has become the gold-standard to reach these objectives. Recently Lee and Oh developed a novel random-effect covariate selection method called the modified unbounded penalty (MUB) regression, whose penalization function can equal minus infinity at 0 in order to produce very sparse models. We sought to compare the predictive accuracy and the number of covariates selected by these two methods in several high-dimensional datasets, consisting in genes expressions measured to predict response to chemotherapy in breast cancer patients. These comparisons were performed by building the Receiver Operating Characteristics (ROC) curves of the classifiers obtained with the selected genes and by comparing their area under the ROC curve (AUC) corrected for optimism using several variants of bootstrap internal validation and cross-validation. We found consistently in all datasets that the MUB penalization selected a remarkably smaller number of covariates than the LASSO while offering a similar—and encouraging—predictive accuracy. The models selected by the MUB were actually nested in the ones obtained with the LASSO. Similar findings were observed when comparing these results to those obtained in their first publication by other authors or when using the area under the Precision-Recall curve (AUCPR) as another measure of predictive performance. In conclusion, the MUB penalization seems therefore to be one of the best options when sparsity is required in high-dimension. Further investigation in other datasets is however required to validate these findings.
format Online
Article
Text
id pubmed-6166949
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61669492018-10-19 Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer Collignon, Olivier Han, Jeongseop An, Hyungmi Oh, Seungyoung Lee, Youngjo PLoS One Research Article Covariate selection is a fundamental step when building sparse prediction models in order to avoid overfitting and to gain a better interpretation of the classifier without losing its predictive accuracy. In practice the LASSO regression of Tibshirani, which penalizes the likelihood of the model by the L1 norm of the regression coefficients, has become the gold-standard to reach these objectives. Recently Lee and Oh developed a novel random-effect covariate selection method called the modified unbounded penalty (MUB) regression, whose penalization function can equal minus infinity at 0 in order to produce very sparse models. We sought to compare the predictive accuracy and the number of covariates selected by these two methods in several high-dimensional datasets, consisting in genes expressions measured to predict response to chemotherapy in breast cancer patients. These comparisons were performed by building the Receiver Operating Characteristics (ROC) curves of the classifiers obtained with the selected genes and by comparing their area under the ROC curve (AUC) corrected for optimism using several variants of bootstrap internal validation and cross-validation. We found consistently in all datasets that the MUB penalization selected a remarkably smaller number of covariates than the LASSO while offering a similar—and encouraging—predictive accuracy. The models selected by the MUB were actually nested in the ones obtained with the LASSO. Similar findings were observed when comparing these results to those obtained in their first publication by other authors or when using the area under the Precision-Recall curve (AUCPR) as another measure of predictive performance. In conclusion, the MUB penalization seems therefore to be one of the best options when sparsity is required in high-dimension. Further investigation in other datasets is however required to validate these findings. Public Library of Science 2018-10-01 /pmc/articles/PMC6166949/ /pubmed/30273405 http://dx.doi.org/10.1371/journal.pone.0204897 Text en © 2018 Collignon et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Collignon, Olivier
Han, Jeongseop
An, Hyungmi
Oh, Seungyoung
Lee, Youngjo
Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
title Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
title_full Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
title_fullStr Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
title_full_unstemmed Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
title_short Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer
title_sort comparison of the modified unbounded penalty and the lasso to select predictive genes of response to chemotherapy in breast cancer
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6166949/
https://www.ncbi.nlm.nih.gov/pubmed/30273405
http://dx.doi.org/10.1371/journal.pone.0204897
work_keys_str_mv AT collignonolivier comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer
AT hanjeongseop comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer
AT anhyungmi comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer
AT ohseungyoung comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer
AT leeyoungjo comparisonofthemodifiedunboundedpenaltyandthelassotoselectpredictivegenesofresponsetochemotherapyinbreastcancer