Cargando…

Large-scale benchmark study of survival prediction methods using multi-omics data

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such...

Descripción completa

Detalles Bibliográficos
Autores principales:	Herrmann, Moritz, Probst, Philipp, Hornung, Roman, Jurinovic, Vindi, Boulesteix, Anne-Laure
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Method Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138887/ https://www.ncbi.nlm.nih.gov/pubmed/32823283 http://dx.doi.org/10.1093/bib/bbaa167

_version_	1783695895339991040
author	Herrmann, Moritz Probst, Philipp Hornung, Roman Jurinovic, Vindi Boulesteix, Anne-Laure
author_facet	Herrmann, Moritz Probst, Philipp Hornung, Roman Jurinovic, Vindi Boulesteix, Anne-Laure
author_sort	Herrmann, Moritz
collection	PubMed
description	Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact: moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.
format	Online Article Text
id	pubmed-8138887
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-81388872021-05-25 Large-scale benchmark study of survival prediction methods using multi-omics data Herrmann, Moritz Probst, Philipp Hornung, Roman Jurinovic, Vindi Boulesteix, Anne-Laure Brief Bioinform Method Review Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact: moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github. Oxford University Press 2020-08-22 /pmc/articles/PMC8138887/ /pubmed/32823283 http://dx.doi.org/10.1093/bib/bbaa167 Text en © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Method Review Herrmann, Moritz Probst, Philipp Hornung, Roman Jurinovic, Vindi Boulesteix, Anne-Laure Large-scale benchmark study of survival prediction methods using multi-omics data
title	Large-scale benchmark study of survival prediction methods using multi-omics data
title_full	Large-scale benchmark study of survival prediction methods using multi-omics data
title_fullStr	Large-scale benchmark study of survival prediction methods using multi-omics data
title_full_unstemmed	Large-scale benchmark study of survival prediction methods using multi-omics data
title_short	Large-scale benchmark study of survival prediction methods using multi-omics data
title_sort	large-scale benchmark study of survival prediction methods using multi-omics data
topic	Method Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138887/ https://www.ncbi.nlm.nih.gov/pubmed/32823283 http://dx.doi.org/10.1093/bib/bbaa167
work_keys_str_mv	AT herrmannmoritz largescalebenchmarkstudyofsurvivalpredictionmethodsusingmultiomicsdata AT probstphilipp largescalebenchmarkstudyofsurvivalpredictionmethodsusingmultiomicsdata AT hornungroman largescalebenchmarkstudyofsurvivalpredictionmethodsusingmultiomicsdata AT jurinovicvindi largescalebenchmarkstudyofsurvivalpredictionmethodsusingmultiomicsdata AT boulesteixannelaure largescalebenchmarkstudyofsurvivalpredictionmethodsusingmultiomicsdata

Large-scale benchmark study of survival prediction methods using multi-omics data

Ejemplares similares