Cargando…

Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients

BACKGROUND: For clinical genomic studies with high-dimensional datasets, tree-based ensemble methods offer a powerful solution for variable selection and prediction taking into account the complex interrelationships between explanatory variables. One of the key component of the tree-building process...

Descripción completa

Detalles Bibliográficos
Autores principales: Mbogning, Cyprien, Broët, Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895817/
https://www.ncbi.nlm.nih.gov/pubmed/27266372
http://dx.doi.org/10.1186/s12859-016-1090-x
Descripción
Sumario:BACKGROUND: For clinical genomic studies with high-dimensional datasets, tree-based ensemble methods offer a powerful solution for variable selection and prediction taking into account the complex interrelationships between explanatory variables. One of the key component of the tree-building process is the splitting criterion. For survival data, the classical splitting criterion is the Logrank statistic. However, the presence of a fraction of nonsusceptible patients in the studied population advocates for considering a criterion tailored to this peculiar situation. RESULTS: We propose a bagging survival tree procedure for variable selection and prediction where the survival tree-building process relies on a splitting criterion that explicitly focuses on time-to-event survival distribution among susceptible patients. A simulation study shows that our method achieves good performance for the variable selection and prediction. Different criteria for evaluating the importance of the explanatory variables and the prediction performance are reported. Our procedure is illustrated on a genomic dataset with gene expression measurements from early breast cancer patients. CONCLUSIONS: In the presence of nonsusceptible patients among the studied population, our procedure represents an efficient way to select event-related explanatory covariates with potential higher-order interaction and identify homogeneous groups of susceptible patients.