Cargando…

Structured feature selection using coordinate descent optimization

BACKGROUND: Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghalwash, Mohamed F., Cao, Xi Hang, Stojkovic, Ivan, Obradovic, Zoran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4826549/
https://www.ncbi.nlm.nih.gov/pubmed/27059502
http://dx.doi.org/10.1186/s12859-016-0954-4
_version_ 1782426352936288256
author Ghalwash, Mohamed F.
Cao, Xi Hang
Stojkovic, Ivan
Obradovic, Zoran
author_facet Ghalwash, Mohamed F.
Cao, Xi Hang
Stojkovic, Ivan
Obradovic, Zoran
author_sort Ghalwash, Mohamed F.
collection PubMed
description BACKGROUND: Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm. RESULTS: In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments. CONCLUSION: A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0954-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4826549
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48265492016-04-10 Structured feature selection using coordinate descent optimization Ghalwash, Mohamed F. Cao, Xi Hang Stojkovic, Ivan Obradovic, Zoran BMC Bioinformatics Methodology Article BACKGROUND: Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm. RESULTS: In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments. CONCLUSION: A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0954-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-04-08 /pmc/articles/PMC4826549/ /pubmed/27059502 http://dx.doi.org/10.1186/s12859-016-0954-4 Text en © Ghalwash et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ghalwash, Mohamed F.
Cao, Xi Hang
Stojkovic, Ivan
Obradovic, Zoran
Structured feature selection using coordinate descent optimization
title Structured feature selection using coordinate descent optimization
title_full Structured feature selection using coordinate descent optimization
title_fullStr Structured feature selection using coordinate descent optimization
title_full_unstemmed Structured feature selection using coordinate descent optimization
title_short Structured feature selection using coordinate descent optimization
title_sort structured feature selection using coordinate descent optimization
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4826549/
https://www.ncbi.nlm.nih.gov/pubmed/27059502
http://dx.doi.org/10.1186/s12859-016-0954-4
work_keys_str_mv AT ghalwashmohamedf structuredfeatureselectionusingcoordinatedescentoptimization
AT caoxihang structuredfeatureselectionusingcoordinatedescentoptimization
AT stojkovicivan structuredfeatureselectionusingcoordinatedescentoptimization
AT obradoviczoran structuredfeatureselectionusingcoordinatedescentoptimization