Cargando…
Feature selection for high-dimensional temporal data
BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we exten...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5778658/ https://www.ncbi.nlm.nih.gov/pubmed/29357817 http://dx.doi.org/10.1186/s12859-018-2023-7 |
_version_ | 1783294396253339648 |
---|---|
author | Tsagris, Michail Lagani, Vincenzo Tsamardinos, Ioannis |
author_facet | Tsagris, Michail Lagani, Vincenzo Tsamardinos, Ioannis |
author_sort | Tsagris, Michail |
collection | PubMed |
description | BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional “omics” temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. RESULTS: The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. CONCLUSIONS: The use of this algorithm is suggested for variable selection with high-dimensional temporal data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-018-2023-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5778658 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57786582018-01-31 Feature selection for high-dimensional temporal data Tsagris, Michail Lagani, Vincenzo Tsamardinos, Ioannis BMC Bioinformatics Research Article BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional “omics” temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. RESULTS: The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. CONCLUSIONS: The use of this algorithm is suggested for variable selection with high-dimensional temporal data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-018-2023-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-23 /pmc/articles/PMC5778658/ /pubmed/29357817 http://dx.doi.org/10.1186/s12859-018-2023-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Tsagris, Michail Lagani, Vincenzo Tsamardinos, Ioannis Feature selection for high-dimensional temporal data |
title | Feature selection for high-dimensional temporal data |
title_full | Feature selection for high-dimensional temporal data |
title_fullStr | Feature selection for high-dimensional temporal data |
title_full_unstemmed | Feature selection for high-dimensional temporal data |
title_short | Feature selection for high-dimensional temporal data |
title_sort | feature selection for high-dimensional temporal data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5778658/ https://www.ncbi.nlm.nih.gov/pubmed/29357817 http://dx.doi.org/10.1186/s12859-018-2023-7 |
work_keys_str_mv | AT tsagrismichail featureselectionforhighdimensionaltemporaldata AT laganivincenzo featureselectionforhighdimensionaltemporaldata AT tsamardinosioannis featureselectionforhighdimensionaltemporaldata |