Cargando…
A plea for taking all available clinical information into account when assessing the predictive value of omics data
BACKGROUND: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657034/ https://www.ncbi.nlm.nih.gov/pubmed/31340753 http://dx.doi.org/10.1186/s12874-019-0802-0 |
_version_ | 1783438728891465728 |
---|---|
author | Volkmann, Alexander De Bin, Riccardo Sauerbrei, Willi Boulesteix, Anne-Laure |
author_facet | Volkmann, Alexander De Bin, Riccardo Sauerbrei, Willi Boulesteix, Anne-Laure |
author_sort | Volkmann, Alexander |
collection | PubMed |
description | BACKGROUND: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical variables. Medical literature on classical prognostic scores, as well as biostatistical literature on appropriate model selection strategies for low dimensional (clinical) data, are often ignored in the context of omics research. The goal of this paper is to fill this methodological gap by investigating the added predictive value of gene expression data for models using varying amounts of clinical information. METHODS: We analyze two data sets from the field of survival prognosis of breast cancer patients. First, we construct several proportional hazards prediction models using varying amounts of clinical information based on established medical knowledge. These models are then used as a starting point (i.e. included as a clinical offset) for identifying informative gene expression variables using resampling procedures and penalized regression approaches (model based boosting and the LASSO). In order to assess the added predictive value of the gene signatures, measures of prediction accuracy and separation are examined on a validation data set for the clinical models and the models that combine the two sources of information. RESULTS: For one data set, we do not find any substantial added predictive value of the omics data when compared to clinical models. On the second data set, we identify a noticeable added predictive value, however only for scenarios where little or no clinical information is included in the modeling process. We find that including more clinical information can lead to a smaller number of selected omics predictors. CONCLUSIONS: New research using omics data should include all available established medical knowledge in order to allow an adequate evaluation of the added predictive value of omics data. Including all relevant clinical information in the analysis might also lead to more parsimonious models. The developed procedure to assess the predictive value of the omics data can be readily applied to other scenarios. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0802-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6657034 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-66570342019-07-31 A plea for taking all available clinical information into account when assessing the predictive value of omics data Volkmann, Alexander De Bin, Riccardo Sauerbrei, Willi Boulesteix, Anne-Laure BMC Med Res Methodol Research Article BACKGROUND: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical variables. Medical literature on classical prognostic scores, as well as biostatistical literature on appropriate model selection strategies for low dimensional (clinical) data, are often ignored in the context of omics research. The goal of this paper is to fill this methodological gap by investigating the added predictive value of gene expression data for models using varying amounts of clinical information. METHODS: We analyze two data sets from the field of survival prognosis of breast cancer patients. First, we construct several proportional hazards prediction models using varying amounts of clinical information based on established medical knowledge. These models are then used as a starting point (i.e. included as a clinical offset) for identifying informative gene expression variables using resampling procedures and penalized regression approaches (model based boosting and the LASSO). In order to assess the added predictive value of the gene signatures, measures of prediction accuracy and separation are examined on a validation data set for the clinical models and the models that combine the two sources of information. RESULTS: For one data set, we do not find any substantial added predictive value of the omics data when compared to clinical models. On the second data set, we identify a noticeable added predictive value, however only for scenarios where little or no clinical information is included in the modeling process. We find that including more clinical information can lead to a smaller number of selected omics predictors. CONCLUSIONS: New research using omics data should include all available established medical knowledge in order to allow an adequate evaluation of the added predictive value of omics data. Including all relevant clinical information in the analysis might also lead to more parsimonious models. The developed procedure to assess the predictive value of the omics data can be readily applied to other scenarios. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12874-019-0802-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-24 /pmc/articles/PMC6657034/ /pubmed/31340753 http://dx.doi.org/10.1186/s12874-019-0802-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Volkmann, Alexander De Bin, Riccardo Sauerbrei, Willi Boulesteix, Anne-Laure A plea for taking all available clinical information into account when assessing the predictive value of omics data |
title | A plea for taking all available clinical information into account when assessing the predictive value of omics data |
title_full | A plea for taking all available clinical information into account when assessing the predictive value of omics data |
title_fullStr | A plea for taking all available clinical information into account when assessing the predictive value of omics data |
title_full_unstemmed | A plea for taking all available clinical information into account when assessing the predictive value of omics data |
title_short | A plea for taking all available clinical information into account when assessing the predictive value of omics data |
title_sort | plea for taking all available clinical information into account when assessing the predictive value of omics data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657034/ https://www.ncbi.nlm.nih.gov/pubmed/31340753 http://dx.doi.org/10.1186/s12874-019-0802-0 |
work_keys_str_mv | AT volkmannalexander apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT debinriccardo apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT sauerbreiwilli apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT boulesteixannelaure apleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT volkmannalexander pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT debinriccardo pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT sauerbreiwilli pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata AT boulesteixannelaure pleafortakingallavailableclinicalinformationintoaccountwhenassessingthepredictivevalueofomicsdata |