Cargando…

Outcome prediction based on microarray analysis: a critical perspective on methods

BACKGROUND: Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zervakis, Michalis, Blazadonakis, Michalis E, Tsiliki, Georgia, Danilatou, Vasiliki, Tsiknakis, Manolis, Kafetzopoulos, Dimitris
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2667512/ https://www.ncbi.nlm.nih.gov/pubmed/19200394 http://dx.doi.org/10.1186/1471-2105-10-53

_version_	1782166139852292096
author	Zervakis, Michalis Blazadonakis, Michalis E Tsiliki, Georgia Danilatou, Vasiliki Tsiknakis, Manolis Kafetzopoulos, Dimitris
author_facet	Zervakis, Michalis Blazadonakis, Michalis E Tsiliki, Georgia Danilatou, Vasiliki Tsiknakis, Manolis Kafetzopoulos, Dimitris
author_sort	Zervakis, Michalis
collection	PubMed
description	BACKGROUND: Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation. RESULTS: A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance. CONCLUSION: Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets.
format	Text
id	pubmed-2667512
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26675122009-04-10 Outcome prediction based on microarray analysis: a critical perspective on methods Zervakis, Michalis Blazadonakis, Michalis E Tsiliki, Georgia Danilatou, Vasiliki Tsiknakis, Manolis Kafetzopoulos, Dimitris BMC Bioinformatics Research Article BACKGROUND: Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation. RESULTS: A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance. CONCLUSION: Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets. BioMed Central 2009-02-07 /pmc/articles/PMC2667512/ /pubmed/19200394 http://dx.doi.org/10.1186/1471-2105-10-53 Text en Copyright © 2009 Zervakis et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Zervakis, Michalis Blazadonakis, Michalis E Tsiliki, Georgia Danilatou, Vasiliki Tsiknakis, Manolis Kafetzopoulos, Dimitris Outcome prediction based on microarray analysis: a critical perspective on methods
title	Outcome prediction based on microarray analysis: a critical perspective on methods
title_full	Outcome prediction based on microarray analysis: a critical perspective on methods
title_fullStr	Outcome prediction based on microarray analysis: a critical perspective on methods
title_full_unstemmed	Outcome prediction based on microarray analysis: a critical perspective on methods
title_short	Outcome prediction based on microarray analysis: a critical perspective on methods
title_sort	outcome prediction based on microarray analysis: a critical perspective on methods
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2667512/ https://www.ncbi.nlm.nih.gov/pubmed/19200394 http://dx.doi.org/10.1186/1471-2105-10-53
work_keys_str_mv	AT zervakismichalis outcomepredictionbasedonmicroarrayanalysisacriticalperspectiveonmethods AT blazadonakismichalise outcomepredictionbasedonmicroarrayanalysisacriticalperspectiveonmethods AT tsilikigeorgia outcomepredictionbasedonmicroarrayanalysisacriticalperspectiveonmethods AT danilatouvasiliki outcomepredictionbasedonmicroarrayanalysisacriticalperspectiveonmethods AT tsiknakismanolis outcomepredictionbasedonmicroarrayanalysisacriticalperspectiveonmethods AT kafetzopoulosdimitris outcomepredictionbasedonmicroarrayanalysisacriticalperspectiveonmethods

Outcome prediction based on microarray analysis: a critical perspective on methods

Ejemplares similares