Cargando…

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

BACKGROUND: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction perf...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bussy, Simon, Veil, Raphaël, Looten, Vincent, Burgun, Anita, Gaïffas, Stéphane, Guilloux, Agathe, Ranque, Brigitte, Jannot, Anne-Sophie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404305/ https://www.ncbi.nlm.nih.gov/pubmed/30841867 http://dx.doi.org/10.1186/s12874-019-0673-4

_version_	1783400849102340096
author	Bussy, Simon Veil, Raphaël Looten, Vincent Burgun, Anita Gaïffas, Stéphane Guilloux, Agathe Ranque, Brigitte Jannot, Anne-Sophie
author_facet	Bussy, Simon Veil, Raphaël Looten, Vincent Burgun, Anita Gaïffas, Stéphane Guilloux, Agathe Ranque, Brigitte Jannot, Anne-Sophie
author_sort	Bussy, Simon
collection	PubMed
description	BACKGROUND: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (where we want to predict whether the readmission will occur within an arbitrarily chosen delay or not) or within a survival analysis setting (where the outcomes are directly the censored times), but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies. METHODS: Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We also propose a method using Gaussian Processes to extract meaningfull structured covariates from longitudinal data. RESULTS: Among all assessed statistical methods, the survival analysis ones obtain the best results. In particular the C-mix model yields the better performances in both the two considered settings (AUC =0.94 in the binary outcome setting), as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings. CONCLUSIONS: It appears that learning withing the survival analysis setting first (so using all the temporal information), and then going back to a binary prediction using the survival estimates gives significantly better prediction performances than the ones obtained by models trained “directly” within the binary outcome setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0673-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6404305
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-64043052019-03-18 Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework Bussy, Simon Veil, Raphaël Looten, Vincent Burgun, Anita Gaïffas, Stéphane Guilloux, Agathe Ranque, Brigitte Jannot, Anne-Sophie BMC Med Res Methodol Research Article BACKGROUND: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (where we want to predict whether the readmission will occur within an arbitrarily chosen delay or not) or within a survival analysis setting (where the outcomes are directly the censored times), but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies. METHODS: Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We also propose a method using Gaussian Processes to extract meaningfull structured covariates from longitudinal data. RESULTS: Among all assessed statistical methods, the survival analysis ones obtain the best results. In particular the C-mix model yields the better performances in both the two considered settings (AUC =0.94 in the binary outcome setting), as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings. CONCLUSIONS: It appears that learning withing the survival analysis setting first (so using all the temporal information), and then going back to a binary prediction using the survival estimates gives significantly better prediction performances than the ones obtained by models trained “directly” within the binary outcome setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0673-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-06 /pmc/articles/PMC6404305/ /pubmed/30841867 http://dx.doi.org/10.1186/s12874-019-0673-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Bussy, Simon Veil, Raphaël Looten, Vincent Burgun, Anita Gaïffas, Stéphane Guilloux, Agathe Ranque, Brigitte Jannot, Anne-Sophie Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
title	Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
title_full	Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
title_fullStr	Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
title_full_unstemmed	Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
title_short	Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
title_sort	comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404305/ https://www.ncbi.nlm.nih.gov/pubmed/30841867 http://dx.doi.org/10.1186/s12874-019-0673-4
work_keys_str_mv	AT bussysimon comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT veilraphael comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT lootenvincent comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT burgunanita comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT gaiffasstephane comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT guillouxagathe comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT ranquebrigitte comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework AT jannotannesophie comparisonofmethodsforearlyreadmissionpredictioninahighdimensionalheterogeneouscovariatesandtimetoeventoutcomeframework

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

Ejemplares similares