Cargando…
Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies
BACKGROUND: The advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their valida...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3580418/ https://www.ncbi.nlm.nih.gov/pubmed/23146350 http://dx.doi.org/10.1186/gm385 |
_version_ | 1782260242015322112 |
---|---|
author | Starmans, Maud HW Pintilie, Melania John, Thomas Der, Sandy D Shepherd, Frances A Jurisica, Igor Lambin, Philippe Tsao, Ming-Sound Boutros, Paul C |
author_facet | Starmans, Maud HW Pintilie, Melania John, Thomas Der, Sandy D Shepherd, Frances A Jurisica, Igor Lambin, Philippe Tsao, Ming-Sound Boutros, Paul C |
author_sort | Starmans, Maud HW |
collection | PubMed |
description | BACKGROUND: The advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC). METHODS: We evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success. RESULTS: Both biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric. CONCLUSIONS: Biomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness. |
format | Online Article Text |
id | pubmed-3580418 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35804182013-03-04 Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies Starmans, Maud HW Pintilie, Melania John, Thomas Der, Sandy D Shepherd, Frances A Jurisica, Igor Lambin, Philippe Tsao, Ming-Sound Boutros, Paul C Genome Med Research BACKGROUND: The advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC). METHODS: We evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success. RESULTS: Both biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric. CONCLUSIONS: Biomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness. BioMed Central 2012-11-12 /pmc/articles/PMC3580418/ /pubmed/23146350 http://dx.doi.org/10.1186/gm385 Text en Copyright ©2012 Starmans et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Starmans, Maud HW Pintilie, Melania John, Thomas Der, Sandy D Shepherd, Frances A Jurisica, Igor Lambin, Philippe Tsao, Ming-Sound Boutros, Paul C Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
title | Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
title_full | Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
title_fullStr | Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
title_full_unstemmed | Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
title_short | Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
title_sort | exploiting the noise: improving biomarkers with ensembles of data analysis methodologies |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3580418/ https://www.ncbi.nlm.nih.gov/pubmed/23146350 http://dx.doi.org/10.1186/gm385 |
work_keys_str_mv | AT starmansmaudhw exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT pintiliemelania exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT johnthomas exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT dersandyd exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT shepherdfrancesa exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT jurisicaigor exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT lambinphilippe exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT tsaomingsound exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies AT boutrospaulc exploitingthenoiseimprovingbiomarkerswithensemblesofdataanalysismethodologies |