Cargando…

UFFizi: a generic platform for ranking informative features

BACKGROUND: Feature selection is an important pre-processing task in the analysis of complex data. Selecting an appropriate subset of features can improve classification or clustering and lead to better understanding of the data. An important example is that of finding an informative group of genes...

Descripción completa

Detalles Bibliográficos
Autores principales: Gottlieb, Assaf, Varshavsky, Roy, Linial, Michal, Horn, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2893168/
https://www.ncbi.nlm.nih.gov/pubmed/20525252
http://dx.doi.org/10.1186/1471-2105-11-300
_version_ 1782183015437303808
author Gottlieb, Assaf
Varshavsky, Roy
Linial, Michal
Horn, David
author_facet Gottlieb, Assaf
Varshavsky, Roy
Linial, Michal
Horn, David
author_sort Gottlieb, Assaf
collection PubMed
description BACKGROUND: Feature selection is an important pre-processing task in the analysis of complex data. Selecting an appropriate subset of features can improve classification or clustering and lead to better understanding of the data. An important example is that of finding an informative group of genes out of thousands that appear in gene-expression analysis. Numerous supervised methods have been suggested but only a few unsupervised ones exist. Unsupervised Feature Filtering (UFF) is such a method, based on an entropy measure of Singular Value Decomposition (SVD), ranking features and selecting a group of preferred ones. RESULTS: We analyze the statistical properties of UFF and present an efficient approximation for the calculation of its entropy measure. This allows us to develop a web-tool that implements the UFF algorithm. We propose novel criteria to indicate whether a considered dataset is amenable to feature selection by UFF. Relying on formalism similar to UFF we propose also an Unsupervised Detection of Outliers (UDO) method, providing a novel definition of outliers and producing a measure to rank the "outlier-degree" of an instance. Our methods are demonstrated on gene and microRNA expression datasets, covering viral infection disease and cancer. We apply UFFizi to select genes from these datasets and discuss their biological and medical relevance. CONCLUSIONS: Statistical properties extracted from the UFF algorithm can distinguish selected features from others. UFFizi is a framework that is based on the UFF algorithm and it is applicable for a wide range of diseases. The framework is also implemented as a web-tool. The web-tool is available at: http://adios.tau.ac.il/UFFizi
format Text
id pubmed-2893168
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28931682010-06-29 UFFizi: a generic platform for ranking informative features Gottlieb, Assaf Varshavsky, Roy Linial, Michal Horn, David BMC Bioinformatics Research article BACKGROUND: Feature selection is an important pre-processing task in the analysis of complex data. Selecting an appropriate subset of features can improve classification or clustering and lead to better understanding of the data. An important example is that of finding an informative group of genes out of thousands that appear in gene-expression analysis. Numerous supervised methods have been suggested but only a few unsupervised ones exist. Unsupervised Feature Filtering (UFF) is such a method, based on an entropy measure of Singular Value Decomposition (SVD), ranking features and selecting a group of preferred ones. RESULTS: We analyze the statistical properties of UFF and present an efficient approximation for the calculation of its entropy measure. This allows us to develop a web-tool that implements the UFF algorithm. We propose novel criteria to indicate whether a considered dataset is amenable to feature selection by UFF. Relying on formalism similar to UFF we propose also an Unsupervised Detection of Outliers (UDO) method, providing a novel definition of outliers and producing a measure to rank the "outlier-degree" of an instance. Our methods are demonstrated on gene and microRNA expression datasets, covering viral infection disease and cancer. We apply UFFizi to select genes from these datasets and discuss their biological and medical relevance. CONCLUSIONS: Statistical properties extracted from the UFF algorithm can distinguish selected features from others. UFFizi is a framework that is based on the UFF algorithm and it is applicable for a wide range of diseases. The framework is also implemented as a web-tool. The web-tool is available at: http://adios.tau.ac.il/UFFizi BioMed Central 2010-06-03 /pmc/articles/PMC2893168/ /pubmed/20525252 http://dx.doi.org/10.1186/1471-2105-11-300 Text en Copyright ©2010 Gottlieb et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Gottlieb, Assaf
Varshavsky, Roy
Linial, Michal
Horn, David
UFFizi: a generic platform for ranking informative features
title UFFizi: a generic platform for ranking informative features
title_full UFFizi: a generic platform for ranking informative features
title_fullStr UFFizi: a generic platform for ranking informative features
title_full_unstemmed UFFizi: a generic platform for ranking informative features
title_short UFFizi: a generic platform for ranking informative features
title_sort uffizi: a generic platform for ranking informative features
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2893168/
https://www.ncbi.nlm.nih.gov/pubmed/20525252
http://dx.doi.org/10.1186/1471-2105-11-300
work_keys_str_mv AT gottliebassaf uffiziagenericplatformforrankinginformativefeatures
AT varshavskyroy uffiziagenericplatformforrankinginformativefeatures
AT linialmichal uffiziagenericplatformforrankinginformativefeatures
AT horndavid uffiziagenericplatformforrankinginformativefeatures