Cargando…

A plea for taking all available clinical information into account when assessing the predictive value of omics data

BACKGROUND: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the...

Descripción completa

Detalles Bibliográficos
Autores principales: Volkmann, Alexander, De Bin, Riccardo, Sauerbrei, Willi, Boulesteix, Anne-Laure
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657034/
https://www.ncbi.nlm.nih.gov/pubmed/31340753
http://dx.doi.org/10.1186/s12874-019-0802-0
_version_ 1783438728891465728
author Volkmann, Alexander
De Bin, Riccardo
Sauerbrei, Willi
Boulesteix, Anne-Laure
author_facet Volkmann, Alexander
De Bin, Riccardo
Sauerbrei, Willi
Boulesteix, Anne-Laure
author_sort Volkmann, Alexander
collection PubMed
description BACKGROUND: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical variables. Medical literature on classical prognostic scores, as well as biostatistical literature on appropriate model selection strategies for low dimensional (clinical) data, are often ignored in the context of omics research. The goal of this paper is to fill this methodological gap by investigating the added predictive value of gene expression data for models using varying amounts of clinical information. METHODS: We analyze two data sets from the field of survival prognosis of breast cancer patients. First, we construct several proportional hazards prediction models using varying amounts of clinical information based on established medical knowledge. These models are then used as a starting point (i.e. included as a clinical offset) for identifying informative gene expression variables using resampling procedures and penalized regression approaches (model based boosting and the LASSO). In order to assess the added predictive value of the gene signatures, measures of prediction accuracy and separation are examined on a validation data set for the clinical models and the models that combine the two sources of information. RESULTS: For one data set, we do not find any substantial added predictive value of the omics data when compared to clinical models. On the second data set, we identify a noticeable added predictive value, however only for scenarios where little or no clinical information is included in the modeling process. We find that including more clinical information can lead to a smaller number of selected omics predictors. CONCLUSIONS: New research using omics data should include all available established medical knowledge in order to allow an adequate evaluation of the added predictive value of omics data. Including all relevant clinical information in the analysis might also lead to more parsimonious models. The developed procedure to assess the predictive value of the omics data can be readily applied to other scenarios. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0802-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6657034
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66570342019-07-31 A plea for taking all available clinical information into account when assessing the predictive value of omics data Volkmann, Alexander De Bin, Riccardo Sauerbrei, Willi Boulesteix, Anne-Laure BMC Med Res Methodol Research Article BACKGROUND: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical variables. Medical literature on classical prognostic scores, as well as biostatistical literature on appropriate model selection strategies for low dimensional (clinical) data, are often ignored in the context of omics research. The goal of this paper is to fill this methodological gap by investigating the added predictive value of gene expression data for models using varying amounts of clinical information. METHODS: We analyze two data sets from the field of survival prognosis of breast cancer patients. First, we construct several proportional hazards prediction models using varying amounts of clinical information based on established medical knowledge. These models are then used as a starting point (i.e. included as a clinical offset) for identifying informative gene expression variables using resampling procedures and penalized regression approaches (model based boosting and the LASSO). In order to assess the added predictive value of the gene signatures, measures of prediction accuracy and separation are examined on a validation data set for the clinical models and the models that combine the two sources of information. RESULTS: For one data set, we do not find any substantial added predictive value of the omics data when compared to clinical models. On the second data set, we identify a noticeable added predictive value, however only for scenarios where little or no clinical information is included in the modeling process. We find that including more clinical information can lead to a smaller number of selected omics predictors. CONCLUSIONS: New research using omics data should include all available established medical knowledge in order to allow an adequate evaluation of the added predictive value of omics data. Including all relevant clinical information in the analysis might also lead to more parsimonious models. The developed procedure to assess the predictive value of the omics data can be readily applied to other scenarios. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0802-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-24 /pmc/articles/PMC6657034/ /pubmed/31340753 http://dx.doi.org/10.1186/s12874-019-0802-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Volkmann, Alexander
De Bin, Riccardo
Sauerbrei, Willi
Boulesteix, Anne-Laure
A plea for taking all available clinical information into account when assessing the predictive value of omics data
title A plea for taking all available clinical information into account when assessing the predictive value of omics data
title_full A plea for taking all available clinical information into account when assessing the predictive value of omics data
title_fullStr A plea for taking all available clinical information into account when assessing the predictive value of omics data
title_full_unstemmed A plea for taking all available clinical information into account when assessing the predictive value of omics data
title_short A plea for taking all available clinical information into account when assessing the predictive value of omics data
title_sort plea for taking all available clinical information into account when assessing the predictive value of omics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657034/
https://www.ncbi.nlm.nih.gov/pubmed/31340753
http://dx.doi.org/10.1186/s12874-019-0802-0
work_keys_str_mv AT volkmannalexander apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT debinriccardo apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT sauerbreiwilli apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT boulesteixannelaure apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT volkmannalexander pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT debinriccardo pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT sauerbreiwilli pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata
AT boulesteixannelaure pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata