Cargando…

A comparison of penalised regression methods for informing the selection of predictive markers

BACKGROUND: Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in ne...

Descripción completa

Detalles Bibliográficos
Autores principales: Greenwood, Christopher J., Youssef, George J., Letcher, Primrose, Macdonald, Jacqui A., Hagg, Lauryn J., Sanson, Ann, Mcintosh, Jenn, Hutchinson, Delyse M., Toumbourou, John W., Fuller-Tyszkiewicz, Matthew, Olsson, Craig A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678959/
https://www.ncbi.nlm.nih.gov/pubmed/33216811
http://dx.doi.org/10.1371/journal.pone.0242730
_version_ 1783612253880188928
author Greenwood, Christopher J.
Youssef, George J.
Letcher, Primrose
Macdonald, Jacqui A.
Hagg, Lauryn J.
Sanson, Ann
Mcintosh, Jenn
Hutchinson, Delyse M.
Toumbourou, John W.
Fuller-Tyszkiewicz, Matthew
Olsson, Craig A.
author_facet Greenwood, Christopher J.
Youssef, George J.
Letcher, Primrose
Macdonald, Jacqui A.
Hagg, Lauryn J.
Sanson, Ann
Mcintosh, Jenn
Hutchinson, Delyse M.
Toumbourou, John W.
Fuller-Tyszkiewicz, Matthew
Olsson, Craig A.
author_sort Greenwood, Christopher J.
collection PubMed
description BACKGROUND: Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. DESIGN: Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. FINDINGS: Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. CONCLUSIONS: Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone.
format Online
Article
Text
id pubmed-7678959
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76789592020-12-02 A comparison of penalised regression methods for informing the selection of predictive markers Greenwood, Christopher J. Youssef, George J. Letcher, Primrose Macdonald, Jacqui A. Hagg, Lauryn J. Sanson, Ann Mcintosh, Jenn Hutchinson, Delyse M. Toumbourou, John W. Fuller-Tyszkiewicz, Matthew Olsson, Craig A. PLoS One Research Article BACKGROUND: Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. DESIGN: Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. FINDINGS: Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. CONCLUSIONS: Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone. Public Library of Science 2020-11-20 /pmc/articles/PMC7678959/ /pubmed/33216811 http://dx.doi.org/10.1371/journal.pone.0242730 Text en © 2020 Greenwood et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Greenwood, Christopher J.
Youssef, George J.
Letcher, Primrose
Macdonald, Jacqui A.
Hagg, Lauryn J.
Sanson, Ann
Mcintosh, Jenn
Hutchinson, Delyse M.
Toumbourou, John W.
Fuller-Tyszkiewicz, Matthew
Olsson, Craig A.
A comparison of penalised regression methods for informing the selection of predictive markers
title A comparison of penalised regression methods for informing the selection of predictive markers
title_full A comparison of penalised regression methods for informing the selection of predictive markers
title_fullStr A comparison of penalised regression methods for informing the selection of predictive markers
title_full_unstemmed A comparison of penalised regression methods for informing the selection of predictive markers
title_short A comparison of penalised regression methods for informing the selection of predictive markers
title_sort comparison of penalised regression methods for informing the selection of predictive markers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678959/
https://www.ncbi.nlm.nih.gov/pubmed/33216811
http://dx.doi.org/10.1371/journal.pone.0242730
work_keys_str_mv AT greenwoodchristopherj acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT youssefgeorgej acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT letcherprimrose acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT macdonaldjacquia acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT hagglaurynj acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT sansonann acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT mcintoshjenn acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT hutchinsondelysem acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT toumbouroujohnw acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT fullertyszkiewiczmatthew acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT olssoncraiga acomparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT greenwoodchristopherj comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT youssefgeorgej comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT letcherprimrose comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT macdonaldjacquia comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT hagglaurynj comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT sansonann comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT mcintoshjenn comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT hutchinsondelysem comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT toumbouroujohnw comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT fullertyszkiewiczmatthew comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers
AT olssoncraiga comparisonofpenalisedregressionmethodsforinformingtheselectionofpredictivemarkers