Cargando…

Feature selection for high-dimensional temporal data

BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we exten...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsagris, Michail, Lagani, Vincenzo, Tsamardinos, Ioannis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5778658/
https://www.ncbi.nlm.nih.gov/pubmed/29357817
http://dx.doi.org/10.1186/s12859-018-2023-7
_version_ 1783294396253339648
author Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
author_facet Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
author_sort Tsagris, Michail
collection PubMed
description BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional “omics” temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. RESULTS: The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. CONCLUSIONS: The use of this algorithm is suggested for variable selection with high-dimensional temporal data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-018-2023-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5778658
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57786582018-01-31 Feature selection for high-dimensional temporal data Tsagris, Michail Lagani, Vincenzo Tsamardinos, Ioannis BMC Bioinformatics Research Article BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional “omics” temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. RESULTS: The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. CONCLUSIONS: The use of this algorithm is suggested for variable selection with high-dimensional temporal data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-018-2023-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-23 /pmc/articles/PMC5778658/ /pubmed/29357817 http://dx.doi.org/10.1186/s12859-018-2023-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
Feature selection for high-dimensional temporal data
title Feature selection for high-dimensional temporal data
title_full Feature selection for high-dimensional temporal data
title_fullStr Feature selection for high-dimensional temporal data
title_full_unstemmed Feature selection for high-dimensional temporal data
title_short Feature selection for high-dimensional temporal data
title_sort feature selection for high-dimensional temporal data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5778658/
https://www.ncbi.nlm.nih.gov/pubmed/29357817
http://dx.doi.org/10.1186/s12859-018-2023-7
work_keys_str_mv AT tsagrismichail featureselectionforhighdimensionaltemporaldata
AT laganivincenzo featureselectionforhighdimensionaltemporaldata
AT tsamardinosioannis featureselectionforhighdimensionaltemporaldata