Cargando…

Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems

BACKGROUND: Our goal was to examine how various aspects of a gene signature influence the success of developing multi-gene prediction models. We inserted gene signatures into three real data sets by altering the expression level of existing probe sets. We varied the number of probe sets perturbed (s...

Descripción completa

Detalles Bibliográficos
Autores principales: Hess, Kenneth R, Wei, Caimiao, Qi, Yuan, Iwamoto, Takayuki, Symmans, W Fraser, Pusztai, Lajos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245512/
https://www.ncbi.nlm.nih.gov/pubmed/22132775
http://dx.doi.org/10.1186/1471-2105-12-463
_version_ 1782219874813083648
author Hess, Kenneth R
Wei, Caimiao
Qi, Yuan
Iwamoto, Takayuki
Symmans, W Fraser
Pusztai, Lajos
author_facet Hess, Kenneth R
Wei, Caimiao
Qi, Yuan
Iwamoto, Takayuki
Symmans, W Fraser
Pusztai, Lajos
author_sort Hess, Kenneth R
collection PubMed
description BACKGROUND: Our goal was to examine how various aspects of a gene signature influence the success of developing multi-gene prediction models. We inserted gene signatures into three real data sets by altering the expression level of existing probe sets. We varied the number of probe sets perturbed (signature size), the fold increase of mean probe set expression in perturbed compared to unperturbed data (signature strength) and the number of samples perturbed. Prediction models were trained to identify which cases had been perturbed. Performance was estimated using Monte-Carlo cross validation. RESULTS: Signature strength had the greatest influence on predictor performance. It was possible to develop almost perfect predictors with as few as 10 features if the fold difference in mean expression values were > 2 even when the spiked samples represented 10% of all samples. We also assessed the gene signature set size and strength for 9 real clinical prediction problems in six different breast cancer data sets. CONCLUSIONS: We found sufficiently large and strong predictive signatures only for distinguishing ER-positive from ER-negative cancers, there were no strong signatures for more subtle prediction problems. Current statistical methods efficiently identify highly informative features in gene expression data if such features exist and accurate models can be built with as few as 10 highly informative features. Features can be considered highly informative if at least 2-fold expression difference exists between comparison groups but such features do not appear to be common for many clinically relevant prediction problems in human data sets.
format Online
Article
Text
id pubmed-3245512
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32455122011-12-24 Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems Hess, Kenneth R Wei, Caimiao Qi, Yuan Iwamoto, Takayuki Symmans, W Fraser Pusztai, Lajos BMC Bioinformatics Research Article BACKGROUND: Our goal was to examine how various aspects of a gene signature influence the success of developing multi-gene prediction models. We inserted gene signatures into three real data sets by altering the expression level of existing probe sets. We varied the number of probe sets perturbed (signature size), the fold increase of mean probe set expression in perturbed compared to unperturbed data (signature strength) and the number of samples perturbed. Prediction models were trained to identify which cases had been perturbed. Performance was estimated using Monte-Carlo cross validation. RESULTS: Signature strength had the greatest influence on predictor performance. It was possible to develop almost perfect predictors with as few as 10 features if the fold difference in mean expression values were > 2 even when the spiked samples represented 10% of all samples. We also assessed the gene signature set size and strength for 9 real clinical prediction problems in six different breast cancer data sets. CONCLUSIONS: We found sufficiently large and strong predictive signatures only for distinguishing ER-positive from ER-negative cancers, there were no strong signatures for more subtle prediction problems. Current statistical methods efficiently identify highly informative features in gene expression data if such features exist and accurate models can be built with as few as 10 highly informative features. Features can be considered highly informative if at least 2-fold expression difference exists between comparison groups but such features do not appear to be common for many clinically relevant prediction problems in human data sets. BioMed Central 2011-12-01 /pmc/articles/PMC3245512/ /pubmed/22132775 http://dx.doi.org/10.1186/1471-2105-12-463 Text en Copyright ©2011 Hess et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hess, Kenneth R
Wei, Caimiao
Qi, Yuan
Iwamoto, Takayuki
Symmans, W Fraser
Pusztai, Lajos
Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
title Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
title_full Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
title_fullStr Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
title_full_unstemmed Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
title_short Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
title_sort lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245512/
https://www.ncbi.nlm.nih.gov/pubmed/22132775
http://dx.doi.org/10.1186/1471-2105-12-463
work_keys_str_mv AT hesskennethr lackofsufficientlystronginformativefeatureslimitsthepotentialofgeneexpressionanalysisaspredictivetoolformanyclinicalclassificationproblems
AT weicaimiao lackofsufficientlystronginformativefeatureslimitsthepotentialofgeneexpressionanalysisaspredictivetoolformanyclinicalclassificationproblems
AT qiyuan lackofsufficientlystronginformativefeatureslimitsthepotentialofgeneexpressionanalysisaspredictivetoolformanyclinicalclassificationproblems
AT iwamototakayuki lackofsufficientlystronginformativefeatureslimitsthepotentialofgeneexpressionanalysisaspredictivetoolformanyclinicalclassificationproblems
AT symmanswfraser lackofsufficientlystronginformativefeatureslimitsthepotentialofgeneexpressionanalysisaspredictivetoolformanyclinicalclassificationproblems
AT pusztailajos lackofsufficientlystronginformativefeatureslimitsthepotentialofgeneexpressionanalysisaspredictivetoolformanyclinicalclassificationproblems