Cargando…
Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure
BACKGROUND: Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872553/ https://www.ncbi.nlm.nih.gov/pubmed/29589558 http://dx.doi.org/10.1186/s12859-018-2059-8 |
_version_ | 1783309862196740096 |
---|---|
author | Foroughi pour, Ali Dalton, Lori A. |
author_facet | Foroughi pour, Ali Dalton, Lori A. |
author_sort | Foroughi pour, Ali |
collection | PubMed |
description | BACKGROUND: Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. RESULTS: The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. CONCLUSIONS: Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2059-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5872553 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58725532018-04-02 Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure Foroughi pour, Ali Dalton, Lori A. BMC Bioinformatics Research BACKGROUND: Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. RESULTS: The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. CONCLUSIONS: Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2059-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-03-21 /pmc/articles/PMC5872553/ /pubmed/29589558 http://dx.doi.org/10.1186/s12859-018-2059-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Foroughi pour, Ali Dalton, Lori A. Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure |
title | Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure |
title_full | Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure |
title_fullStr | Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure |
title_full_unstemmed | Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure |
title_short | Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure |
title_sort | heuristic algorithms for feature selection under bayesian models with block-diagonal covariance structure |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872553/ https://www.ncbi.nlm.nih.gov/pubmed/29589558 http://dx.doi.org/10.1186/s12859-018-2059-8 |
work_keys_str_mv | AT foroughipourali heuristicalgorithmsforfeatureselectionunderbayesianmodelswithblockdiagonalcovariancestructure AT daltonloria heuristicalgorithmsforfeatureselectionunderbayesianmodelswithblockdiagonalcovariancestructure |