Cargando…

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data

BACKGROUND: Numerous feature selection methods have been applied to the identification of differentially expressed genes in microarray data. These include simple fold change, classical t-statistic and moderated t-statistics. Even though these methods return gene lists that are often dissimilar, few...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jeffery, Ian B, Higgins, Desmond G, Culhane, Aedín C
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1544358/ https://www.ncbi.nlm.nih.gov/pubmed/16872483 http://dx.doi.org/10.1186/1471-2105-7-359

_version_	1782129203751157760
author	Jeffery, Ian B Higgins, Desmond G Culhane, Aedín C
author_facet	Jeffery, Ian B Higgins, Desmond G Culhane, Aedín C
author_sort	Jeffery, Ian B
collection	PubMed
description	BACKGROUND: Numerous feature selection methods have been applied to the identification of differentially expressed genes in microarray data. These include simple fold change, classical t-statistic and moderated t-statistics. Even though these methods return gene lists that are often dissimilar, few direct comparisons of these exist. We present an empirical study in which we compare some of the most commonly used feature selection methods. We apply these to 9 publicly available datasets, and compare, both the gene lists produced and how these perform in class prediction of test datasets. RESULTS: In this study, we compared the efficiency of the feature selection methods; significance analysis of microarrays (SAM), analysis of variance (ANOVA), empirical bayes t-statistic, template matching, maxT, between group analysis (BGA), Area under the receiver operating characteristic (ROC) curve, the Welch t-statistic, fold change, rank products, and sets of randomly selected genes. In each case these methods were applied to 9 different binary (two class) microarray datasets. Firstly we found little agreement in gene lists produced by the different methods. Only 8 to 21% of genes were in common across all 10 feature selection methods. Secondly, we evaluated the class prediction efficiency of each gene list in training and test cross-validation using four supervised classifiers. CONCLUSION: We report that the choice of feature selection method, the number of genes in the genelist, the number of cases (samples) and the noise in the dataset, substantially influence classification success. Recommendations are made for choice of feature selection. Area under a ROC curve performed well with datasets that had low levels of noise and large sample size. Rank products performs well when datasets had low numbers of samples or high levels of noise. The Empirical bayes t-statistic performed well across a range of sample sizes.
format	Text
id	pubmed-1544358
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15443582006-08-19 Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data Jeffery, Ian B Higgins, Desmond G Culhane, Aedín C BMC Bioinformatics Research Article BACKGROUND: Numerous feature selection methods have been applied to the identification of differentially expressed genes in microarray data. These include simple fold change, classical t-statistic and moderated t-statistics. Even though these methods return gene lists that are often dissimilar, few direct comparisons of these exist. We present an empirical study in which we compare some of the most commonly used feature selection methods. We apply these to 9 publicly available datasets, and compare, both the gene lists produced and how these perform in class prediction of test datasets. RESULTS: In this study, we compared the efficiency of the feature selection methods; significance analysis of microarrays (SAM), analysis of variance (ANOVA), empirical bayes t-statistic, template matching, maxT, between group analysis (BGA), Area under the receiver operating characteristic (ROC) curve, the Welch t-statistic, fold change, rank products, and sets of randomly selected genes. In each case these methods were applied to 9 different binary (two class) microarray datasets. Firstly we found little agreement in gene lists produced by the different methods. Only 8 to 21% of genes were in common across all 10 feature selection methods. Secondly, we evaluated the class prediction efficiency of each gene list in training and test cross-validation using four supervised classifiers. CONCLUSION: We report that the choice of feature selection method, the number of genes in the genelist, the number of cases (samples) and the noise in the dataset, substantially influence classification success. Recommendations are made for choice of feature selection. Area under a ROC curve performed well with datasets that had low levels of noise and large sample size. Rank products performs well when datasets had low numbers of samples or high levels of noise. The Empirical bayes t-statistic performed well across a range of sample sizes. BioMed Central 2006-07-26 /pmc/articles/PMC1544358/ /pubmed/16872483 http://dx.doi.org/10.1186/1471-2105-7-359 Text en Copyright © 2006 Jeffery et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Jeffery, Ian B Higgins, Desmond G Culhane, Aedín C Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
title	Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
title_full	Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
title_fullStr	Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
title_full_unstemmed	Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
title_short	Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
title_sort	comparison and evaluation of methods for generating differentially expressed gene lists from microarray data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1544358/ https://www.ncbi.nlm.nih.gov/pubmed/16872483 http://dx.doi.org/10.1186/1471-2105-7-359
work_keys_str_mv	AT jefferyianb comparisonandevaluationofmethodsforgeneratingdifferentiallyexpressedgenelistsfrommicroarraydata AT higginsdesmondg comparisonandevaluationofmethodsforgeneratingdifferentiallyexpressedgenelistsfrommicroarraydata AT culhaneaedinc comparisonandevaluationofmethodsforgeneratingdifferentiallyexpressedgenelistsfrommicroarraydata

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data

Ejemplares similares