Cargando…

Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data

MOTIVATION: Finding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block...

Descripción completa

Detalles Bibliográficos
Autores principales: Climente-González, Héctor, Azencott, Chloé-Agathe, Kaski, Samuel, Yamada, Makoto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612810/
https://www.ncbi.nlm.nih.gov/pubmed/31510671
http://dx.doi.org/10.1093/bioinformatics/btz333
_version_ 1783432942150746112
author Climente-González, Héctor
Azencott, Chloé-Agathe
Kaski, Samuel
Yamada, Makoto
author_facet Climente-González, Héctor
Azencott, Chloé-Agathe
Kaski, Samuel
Yamada, Makoto
author_sort Climente-González, Héctor
collection PubMed
description MOTIVATION: Finding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks. RESULTS: We compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons. AVAILABILITY AND IMPLEMENTATION: Block HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6612810
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128102019-07-12 Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data Climente-González, Héctor Azencott, Chloé-Agathe Kaski, Samuel Yamada, Makoto Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Finding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks. RESULTS: We compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons. AVAILABILITY AND IMPLEMENTATION: Block HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612810/ /pubmed/31510671 http://dx.doi.org/10.1093/bioinformatics/btz333 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Climente-González, Héctor
Azencott, Chloé-Agathe
Kaski, Samuel
Yamada, Makoto
Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
title Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
title_full Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
title_fullStr Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
title_full_unstemmed Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
title_short Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
title_sort block hsic lasso: model-free biomarker detection for ultra-high dimensional data
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612810/
https://www.ncbi.nlm.nih.gov/pubmed/31510671
http://dx.doi.org/10.1093/bioinformatics/btz333
work_keys_str_mv AT climentegonzalezhector blockhsiclassomodelfreebiomarkerdetectionforultrahighdimensionaldata
AT azencottchloeagathe blockhsiclassomodelfreebiomarkerdetectionforultrahighdimensionaldata
AT kaskisamuel blockhsiclassomodelfreebiomarkerdetectionforultrahighdimensionaldata
AT yamadamakoto blockhsiclassomodelfreebiomarkerdetectionforultrahighdimensionaldata