Cargando…
Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including pote...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209828/ https://www.ncbi.nlm.nih.gov/pubmed/28049413 http://dx.doi.org/10.1186/s12859-016-1423-9 |
_version_ | 1782490801062805504 |
---|---|
author | Radovic, Milos Ghalwash, Mohamed Filipovic, Nenad Obradovic, Zoran |
author_facet | Radovic, Milos Ghalwash, Mohamed Filipovic, Nenad Obradovic, Zoran |
author_sort | Radovic, Milos |
collection | PubMed |
description | BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. RESULTS: The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. CONCLUSION: We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1423-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5209828 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52098282017-01-04 Minimum redundancy maximum relevance feature selection approach for temporal gene expression data Radovic, Milos Ghalwash, Mohamed Filipovic, Nenad Obradovic, Zoran BMC Bioinformatics Methodology Article BACKGROUND: Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. RESULTS: The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. CONCLUSION: We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1423-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-03 /pmc/articles/PMC5209828/ /pubmed/28049413 http://dx.doi.org/10.1186/s12859-016-1423-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Radovic, Milos Ghalwash, Mohamed Filipovic, Nenad Obradovic, Zoran Minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
title | Minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
title_full | Minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
title_fullStr | Minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
title_full_unstemmed | Minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
title_short | Minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
title_sort | minimum redundancy maximum relevance feature selection approach for temporal gene expression data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5209828/ https://www.ncbi.nlm.nih.gov/pubmed/28049413 http://dx.doi.org/10.1186/s12859-016-1423-9 |
work_keys_str_mv | AT radovicmilos minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata AT ghalwashmohamed minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata AT filipovicnenad minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata AT obradoviczoran minimumredundancymaximumrelevancefeatureselectionapproachfortemporalgeneexpressiondata |