Cargando…

A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures

BACKGROUND: Highly parallel analysis of gene expression has recently been used to identify gene sets or ‘signatures’ to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to th...

Descripción completa

Detalles Bibliográficos
Autores principales: Starmans, Maud H. W., Fung, Glenn, Steck, Harald, Wouters, Bradly G., Lambin, Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3233554/
https://www.ncbi.nlm.nih.gov/pubmed/22163293
http://dx.doi.org/10.1371/journal.pone.0028320
_version_ 1782218430155325440
author Starmans, Maud H. W.
Fung, Glenn
Steck, Harald
Wouters, Bradly G.
Lambin, Philippe
author_facet Starmans, Maud H. W.
Fung, Glenn
Steck, Harald
Wouters, Bradly G.
Lambin, Philippe
author_sort Starmans, Maud H. W.
collection PubMed
description BACKGROUND: Highly parallel analysis of gene expression has recently been used to identify gene sets or ‘signatures’ to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures. PRINCIPAL FINDINGS: A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ∼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number. CONCLUSIONS: We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited.
format Online
Article
Text
id pubmed-3233554
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32335542011-12-12 A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures Starmans, Maud H. W. Fung, Glenn Steck, Harald Wouters, Bradly G. Lambin, Philippe PLoS One Research Article BACKGROUND: Highly parallel analysis of gene expression has recently been used to identify gene sets or ‘signatures’ to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures. PRINCIPAL FINDINGS: A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ∼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number. CONCLUSIONS: We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited. Public Library of Science 2011-12-07 /pmc/articles/PMC3233554/ /pubmed/22163293 http://dx.doi.org/10.1371/journal.pone.0028320 Text en Starmans et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Starmans, Maud H. W.
Fung, Glenn
Steck, Harald
Wouters, Bradly G.
Lambin, Philippe
A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
title A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
title_full A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
title_fullStr A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
title_full_unstemmed A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
title_short A Simple but Highly Effective Approach to Evaluate the Prognostic Performance of Gene Expression Signatures
title_sort simple but highly effective approach to evaluate the prognostic performance of gene expression signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3233554/
https://www.ncbi.nlm.nih.gov/pubmed/22163293
http://dx.doi.org/10.1371/journal.pone.0028320
work_keys_str_mv AT starmansmaudhw asimplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT fungglenn asimplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT steckharald asimplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT woutersbradlyg asimplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT lambinphilippe asimplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT starmansmaudhw simplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT fungglenn simplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT steckharald simplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT woutersbradlyg simplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures
AT lambinphilippe simplebuthighlyeffectiveapproachtoevaluatetheprognosticperformanceofgeneexpressionsignatures