Cargando…

Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile

BACKGROUND: Microarray data have been used for gene signature selection to predict clinical outcomes. Many studies have attempted to identify factors that affect models' performance with only little success. Fine-tuning of model parameters and optimizing each step of the modeling process often...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Chen, Shi, Leming, Tong, Weida, Shaughnessy, John D, Oberthuer, André, Pusztai, Lajos, Deng, Youping, Symmans, W Fraser, Shi, Tieliu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287499/
https://www.ncbi.nlm.nih.gov/pubmed/22369035
http://dx.doi.org/10.1186/1471-2164-12-S5-S3
_version_ 1782224677118148608
author Zhao, Chen
Shi, Leming
Tong, Weida
Shaughnessy, John D
Oberthuer, André
Pusztai, Lajos
Deng, Youping
Symmans, W Fraser
Shi, Tieliu
author_facet Zhao, Chen
Shi, Leming
Tong, Weida
Shaughnessy, John D
Oberthuer, André
Pusztai, Lajos
Deng, Youping
Symmans, W Fraser
Shi, Tieliu
author_sort Zhao, Chen
collection PubMed
description BACKGROUND: Microarray data have been used for gene signature selection to predict clinical outcomes. Many studies have attempted to identify factors that affect models' performance with only little success. Fine-tuning of model parameters and optimizing each step of the modeling process often results in over-fitting problems without improving performance. RESULTS: We propose a quantitative measurement, termed consistency degree, to detect the correlation between disease endpoint and gene expression profile. Different endpoints were shown to have different consistency degrees to gene expression profiles. The validity of this measurement to estimate the consistency was tested with significance at a p-value less than 2.2e-16 for all of the studied endpoints. According to the consistency degree score, overall survival milestone outcome of multiple myeloma was proposed to extend from 730 days to 1561 days, which is more consistent with gene expression profile. CONCLUSION: For various clinical endpoints, the maximum predictive powers of different microarray-based models are limited by the correlation between endpoint and gene expression profile of disease samples as indicated by the consistency degree score. In addition, previous defined clinical outcomes can also be reassessed and refined more coherent according to related disease gene expression profile. Our findings point to an entirely new direction for assessing the microarray-based predictive models and provide important information to gene signature based clinical applications.
format Online
Article
Text
id pubmed-3287499
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32874992012-03-01 Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile Zhao, Chen Shi, Leming Tong, Weida Shaughnessy, John D Oberthuer, André Pusztai, Lajos Deng, Youping Symmans, W Fraser Shi, Tieliu BMC Genomics Research Article BACKGROUND: Microarray data have been used for gene signature selection to predict clinical outcomes. Many studies have attempted to identify factors that affect models' performance with only little success. Fine-tuning of model parameters and optimizing each step of the modeling process often results in over-fitting problems without improving performance. RESULTS: We propose a quantitative measurement, termed consistency degree, to detect the correlation between disease endpoint and gene expression profile. Different endpoints were shown to have different consistency degrees to gene expression profiles. The validity of this measurement to estimate the consistency was tested with significance at a p-value less than 2.2e-16 for all of the studied endpoints. According to the consistency degree score, overall survival milestone outcome of multiple myeloma was proposed to extend from 730 days to 1561 days, which is more consistent with gene expression profile. CONCLUSION: For various clinical endpoints, the maximum predictive powers of different microarray-based models are limited by the correlation between endpoint and gene expression profile of disease samples as indicated by the consistency degree score. In addition, previous defined clinical outcomes can also be reassessed and refined more coherent according to related disease gene expression profile. Our findings point to an entirely new direction for assessing the microarray-based predictive models and provide important information to gene signature based clinical applications. BioMed Central 2011-12-23 /pmc/articles/PMC3287499/ /pubmed/22369035 http://dx.doi.org/10.1186/1471-2164-12-S5-S3 Text en Copyright ©2011 Zhao et al. licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhao, Chen
Shi, Leming
Tong, Weida
Shaughnessy, John D
Oberthuer, André
Pusztai, Lajos
Deng, Youping
Symmans, W Fraser
Shi, Tieliu
Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
title Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
title_full Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
title_fullStr Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
title_full_unstemmed Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
title_short Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
title_sort maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287499/
https://www.ncbi.nlm.nih.gov/pubmed/22369035
http://dx.doi.org/10.1186/1471-2164-12-S5-S3
work_keys_str_mv AT zhaochen maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT shileming maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT tongweida maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT shaughnessyjohnd maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT oberthuerandre maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT pusztailajos maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT dengyouping maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT symmanswfraser maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile
AT shitieliu maximumpredictivepowerofthemicroarraybasedmodelsforclinicaloutcomesislimitedbycorrelationbetweenendpointandgeneexpressionprofile