Cargando…
Effects of sample size on robustness and prediction accuracy of a prognostic gene signature
BACKGROUND: Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generat...
Autor principal: | |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689196/ https://www.ncbi.nlm.nih.gov/pubmed/19445687 http://dx.doi.org/10.1186/1471-2105-10-147 |
_version_ | 1782167757311180800 |
---|---|
author | Kim, Seon-Young |
author_facet | Kim, Seon-Young |
author_sort | Kim, Seon-Young |
collection | PubMed |
description | BACKGROUND: Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature. RESULTS: A data set of 1,372 samples was generated by combining eight breast cancer gene expression data sets produced using the same microarray platform and, using the data set, effects of varying samples sizes on a few performances of a prognostic gene signature were investigated. The overlap between independently developed gene signatures was increased linearly with more samples, attaining an average overlap of 16.56% with 600 samples. The concordance between predicted outcomes by different gene signatures also was increased with more samples up to 94.61% with 300 samples. The accuracy of outcome prediction also increased with more samples. Finally, analysis using only Estrogen Receptor-positive (ER+) patients attained higher prediction accuracy than using both patients, suggesting that sub-type specific analysis can lead to the development of better prognostic gene signatures CONCLUSION: Increasing sample sizes generated a gene signature with better stability, better concordance in outcome prediction, and better prediction accuracy. However, the degree of performance improvement by the increased sample size was different between the degree of overlap and the degree of concordance in outcome prediction, suggesting that the sample size required for a study should be determined according to the specific aims of the study. |
format | Text |
id | pubmed-2689196 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26891962009-06-02 Effects of sample size on robustness and prediction accuracy of a prognostic gene signature Kim, Seon-Young BMC Bioinformatics Research Article BACKGROUND: Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature. RESULTS: A data set of 1,372 samples was generated by combining eight breast cancer gene expression data sets produced using the same microarray platform and, using the data set, effects of varying samples sizes on a few performances of a prognostic gene signature were investigated. The overlap between independently developed gene signatures was increased linearly with more samples, attaining an average overlap of 16.56% with 600 samples. The concordance between predicted outcomes by different gene signatures also was increased with more samples up to 94.61% with 300 samples. The accuracy of outcome prediction also increased with more samples. Finally, analysis using only Estrogen Receptor-positive (ER+) patients attained higher prediction accuracy than using both patients, suggesting that sub-type specific analysis can lead to the development of better prognostic gene signatures CONCLUSION: Increasing sample sizes generated a gene signature with better stability, better concordance in outcome prediction, and better prediction accuracy. However, the degree of performance improvement by the increased sample size was different between the degree of overlap and the degree of concordance in outcome prediction, suggesting that the sample size required for a study should be determined according to the specific aims of the study. BioMed Central 2009-05-16 /pmc/articles/PMC2689196/ /pubmed/19445687 http://dx.doi.org/10.1186/1471-2105-10-147 Text en Copyright © 2009 Kim; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Kim, Seon-Young Effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
title | Effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
title_full | Effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
title_fullStr | Effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
title_full_unstemmed | Effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
title_short | Effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
title_sort | effects of sample size on robustness and prediction accuracy of a prognostic gene signature |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689196/ https://www.ncbi.nlm.nih.gov/pubmed/19445687 http://dx.doi.org/10.1186/1471-2105-10-147 |
work_keys_str_mv | AT kimseonyoung effectsofsamplesizeonrobustnessandpredictionaccuracyofaprognosticgenesignature |