Cargando…

Flexible co‐data learning for high‐dimensional prediction

Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high‐dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve p...

Descripción completa

Detalles Bibliográficos
Autores principales:	van Nee, Mirrelijn M., Wessels, Lodewyk F.A., van de Wiel, Mark A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2021
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292202/ https://www.ncbi.nlm.nih.gov/pubmed/34438466 http://dx.doi.org/10.1002/sim.9162

_version_	1784749313789788160
author	van Nee, Mirrelijn M. Wessels, Lodewyk F.A. van de Wiel, Mark A.
author_facet	van Nee, Mirrelijn M. Wessels, Lodewyk F.A. van de Wiel, Mark A.
author_sort	van Nee, Mirrelijn M.
collection	PubMed
description	Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high‐dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co‐data, provide information on the covariates, such as genomic location or P‐values from external studies. We use multiple and various co‐data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi‐group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non‐informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co‐data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co‐data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation.
format	Online Article Text
id	pubmed-9292202
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-92922022022-07-20 Flexible co‐data learning for high‐dimensional prediction van Nee, Mirrelijn M. Wessels, Lodewyk F.A. van de Wiel, Mark A. Stat Med Research Articles Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high‐dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co‐data, provide information on the covariates, such as genomic location or P‐values from external studies. We use multiple and various co‐data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi‐group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non‐informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co‐data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co‐data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation. John Wiley and Sons Inc. 2021-08-26 2021-11-20 /pmc/articles/PMC9292202/ /pubmed/34438466 http://dx.doi.org/10.1002/sim.9162 Text en © 2021 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle	Research Articles van Nee, Mirrelijn M. Wessels, Lodewyk F.A. van de Wiel, Mark A. Flexible co‐data learning for high‐dimensional prediction
title	Flexible co‐data learning for high‐dimensional prediction
title_full	Flexible co‐data learning for high‐dimensional prediction
title_fullStr	Flexible co‐data learning for high‐dimensional prediction
title_full_unstemmed	Flexible co‐data learning for high‐dimensional prediction
title_short	Flexible co‐data learning for high‐dimensional prediction
title_sort	flexible co‐data learning for high‐dimensional prediction
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292202/ https://www.ncbi.nlm.nih.gov/pubmed/34438466 http://dx.doi.org/10.1002/sim.9162
work_keys_str_mv	AT vanneemirrelijnm flexiblecodatalearningforhighdimensionalprediction AT wesselslodewykfa flexiblecodatalearningforhighdimensionalprediction AT vandewielmarka flexiblecodatalearningforhighdimensionalprediction

Flexible co‐data learning for high‐dimensional prediction

Ejemplares similares