Cargando…

Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point

BACKGROUND: The availability of large epidemiological or clinical data storing biological samples allow to study the prognostic value of novel biomarkers, but efficient designs are needed to select a subsample on which to measure them, for parsimony and economical reasons. Two-phase stratified sampl...

Descripción completa

Detalles Bibliográficos
Autores principales: Graziano, Francesca, Valsecchi, Maria Grazia, Rebora, Paola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8091513/
https://www.ncbi.nlm.nih.gov/pubmed/33941092
http://dx.doi.org/10.1186/s12874-021-01283-0
_version_ 1783687498789027840
author Graziano, Francesca
Valsecchi, Maria Grazia
Rebora, Paola
author_facet Graziano, Francesca
Valsecchi, Maria Grazia
Rebora, Paola
author_sort Graziano, Francesca
collection PubMed
description BACKGROUND: The availability of large epidemiological or clinical data storing biological samples allow to study the prognostic value of novel biomarkers, but efficient designs are needed to select a subsample on which to measure them, for parsimony and economical reasons. Two-phase stratified sampling is a flexible approach to perform such sub-sampling, but literature on stratification variables to be used in the sampling and power evaluation is lacking especially for survival data. METHODS: We compared the performance of different sampling designs to assess the prognostic value of a new biomarker on a time-to-event endpoint, applying a Cox model weighted by the inverse of the empirical inclusion probability. RESULTS: Our simulation results suggest that case-control stratified (or post stratified) by a surrogate variable of the marker can yield higher performances than simple random, probability proportional to size, and case-control sampling. In the presence of high censoring rate, results showed an advantage of nested case-control and counter-matching designs in term of design effect, although the use of a fixed ratio between cases and controls might be disadvantageous. On real data on childhood acute lymphoblastic leukemia, we found that optimal sampling using pilot data is greatly efficient. CONCLUSIONS: Our study suggests that, in our sample, case-control stratified by surrogate and nested case-control yield estimates and power comparable to estimates obtained in the full cohort while strongly decreasing the number of patients required. We recommend to plan the sample size and using sampling designs for exploration of novel biomarker in clinical cohort data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01283-0.
format Online
Article
Text
id pubmed-8091513
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80915132021-05-04 Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point Graziano, Francesca Valsecchi, Maria Grazia Rebora, Paola BMC Med Res Methodol Research Article BACKGROUND: The availability of large epidemiological or clinical data storing biological samples allow to study the prognostic value of novel biomarkers, but efficient designs are needed to select a subsample on which to measure them, for parsimony and economical reasons. Two-phase stratified sampling is a flexible approach to perform such sub-sampling, but literature on stratification variables to be used in the sampling and power evaluation is lacking especially for survival data. METHODS: We compared the performance of different sampling designs to assess the prognostic value of a new biomarker on a time-to-event endpoint, applying a Cox model weighted by the inverse of the empirical inclusion probability. RESULTS: Our simulation results suggest that case-control stratified (or post stratified) by a surrogate variable of the marker can yield higher performances than simple random, probability proportional to size, and case-control sampling. In the presence of high censoring rate, results showed an advantage of nested case-control and counter-matching designs in term of design effect, although the use of a fixed ratio between cases and controls might be disadvantageous. On real data on childhood acute lymphoblastic leukemia, we found that optimal sampling using pilot data is greatly efficient. CONCLUSIONS: Our study suggests that, in our sample, case-control stratified by surrogate and nested case-control yield estimates and power comparable to estimates obtained in the full cohort while strongly decreasing the number of patients required. We recommend to plan the sample size and using sampling designs for exploration of novel biomarker in clinical cohort data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01283-0. BioMed Central 2021-04-30 /pmc/articles/PMC8091513/ /pubmed/33941092 http://dx.doi.org/10.1186/s12874-021-01283-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Graziano, Francesca
Valsecchi, Maria Grazia
Rebora, Paola
Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
title Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
title_full Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
title_fullStr Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
title_full_unstemmed Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
title_short Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
title_sort sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8091513/
https://www.ncbi.nlm.nih.gov/pubmed/33941092
http://dx.doi.org/10.1186/s12874-021-01283-0
work_keys_str_mv AT grazianofrancesca samplingstrategiestoevaluatetheprognosticvalueofanewbiomarkeronatimetoeventendpoint
AT valsecchimariagrazia samplingstrategiestoevaluatetheprognosticvalueofanewbiomarkeronatimetoeventendpoint
AT reborapaola samplingstrategiestoevaluatetheprognosticvalueofanewbiomarkeronatimetoeventendpoint