Cargando…

Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies

BACKGROUND: The advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their valida...

Descripción completa

Detalles Bibliográficos
Autores principales: Starmans, Maud HW, Pintilie, Melania, John, Thomas, Der, Sandy D, Shepherd, Frances A, Jurisica, Igor, Lambin, Philippe, Tsao, Ming-Sound, Boutros, Paul C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3580418/
https://www.ncbi.nlm.nih.gov/pubmed/23146350
http://dx.doi.org/10.1186/gm385
_version_ 1782260242015322112
author Starmans, Maud HW
Pintilie, Melania
John, Thomas
Der, Sandy D
Shepherd, Frances A
Jurisica, Igor
Lambin, Philippe
Tsao, Ming-Sound
Boutros, Paul C
author_facet Starmans, Maud HW
Pintilie, Melania
John, Thomas
Der, Sandy D
Shepherd, Frances A
Jurisica, Igor
Lambin, Philippe
Tsao, Ming-Sound
Boutros, Paul C
author_sort Starmans, Maud HW
collection PubMed
description BACKGROUND: The advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC). METHODS: We evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success. RESULTS: Both biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric. CONCLUSIONS: Biomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness.
format Online
Article
Text
id pubmed-3580418
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35804182013-03-04 Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies Starmans, Maud HW Pintilie, Melania John, Thomas Der, Sandy D Shepherd, Frances A Jurisica, Igor Lambin, Philippe Tsao, Ming-Sound Boutros, Paul C Genome Med Research BACKGROUND: The advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC). METHODS: We evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success. RESULTS: Both biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric. CONCLUSIONS: Biomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness. BioMed Central 2012-11-12 /pmc/articles/PMC3580418/ /pubmed/23146350 http://dx.doi.org/10.1186/gm385 Text en Copyright ©2012 Starmans et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Starmans, Maud HW
Pintilie, Melania
John, Thomas
Der, Sandy D
Shepherd, Frances A
Jurisica, Igor
Lambin, Philippe
Tsao, Ming-Sound
Boutros, Paul C
Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
title Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
title_full Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
title_fullStr Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
title_full_unstemmed Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
title_short Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
title_sort exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3580418/
https://www.ncbi.nlm.nih.gov/pubmed/23146350
http://dx.doi.org/10.1186/gm385
work_keys_str_mv AT starmansmaudhw exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT pintiliemelania exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT johnthomas exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT dersandyd exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT shepherdfrancesa exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT jurisicaigor exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT lambinphilippe exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT tsaomingsound exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies
AT boutrospaulc exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies