Cargando…

Minimum redundancy maximum relevance feature selection approach for temporal gene expression data

BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including pote...

Descripción completa

Detalles Bibliográficos
Autores principales: Radovic, Milos, Ghalwash, Mohamed, Filipovic, Nenad, Obradovic, Zoran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209828/
https://www.ncbi.nlm.nih.gov/pubmed/28049413
http://dx.doi.org/10.1186/s12859-016-1423-9
_version_ 1782490801062805504
author Radovic, Milos
Ghalwash, Mohamed
Filipovic, Nenad
Obradovic, Zoran
author_facet Radovic, Milos
Ghalwash, Mohamed
Filipovic, Nenad
Obradovic, Zoran
author_sort Radovic, Milos
collection PubMed
description BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. RESULTS: The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. CONCLUSION: We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1423-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5209828
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52098282017-01-04 Minimum redundancy maximum relevance feature selection approach for temporal gene expression data Radovic, Milos Ghalwash, Mohamed Filipovic, Nenad Obradovic, Zoran BMC Bioinformatics Methodology Article BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. RESULTS: The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. CONCLUSION: We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1423-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-03 /pmc/articles/PMC5209828/ /pubmed/28049413 http://dx.doi.org/10.1186/s12859-016-1423-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Radovic, Milos
Ghalwash, Mohamed
Filipovic, Nenad
Obradovic, Zoran
Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
title Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
title_full Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
title_fullStr Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
title_full_unstemmed Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
title_short Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
title_sort minimum redundancy maximum relevance feature selection approach for temporal gene expression data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209828/
https://www.ncbi.nlm.nih.gov/pubmed/28049413
http://dx.doi.org/10.1186/s12859-016-1423-9
work_keys_str_mv AT radovicmilos minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata
AT ghalwashmohamed minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata
AT filipovicnenad minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata
AT obradoviczoran minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata