Cargando…

Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data

The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the n...

Descripción completa

Detalles Bibliográficos
Autores principales: Taguchi, Y-h., Turki, Turki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7763286/
https://www.ncbi.nlm.nih.gov/pubmed/33322492
http://dx.doi.org/10.3390/genes11121493
_version_ 1783627982411137024
author Taguchi, Y-h.
Turki, Turki
author_facet Taguchi, Y-h.
Turki, Turki
author_sort Taguchi, Y-h.
collection PubMed
description The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements.
format Online
Article
Text
id pubmed-7763286
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77632862020-12-27 Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data Taguchi, Y-h. Turki, Turki Genes (Basel) Article The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements. MDPI 2020-12-11 /pmc/articles/PMC7763286/ /pubmed/33322492 http://dx.doi.org/10.3390/genes11121493 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Taguchi, Y-h.
Turki, Turki
Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data
title Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data
title_full Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data
title_fullStr Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data
title_full_unstemmed Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data
title_short Tensor-Decomposition-Based Unsupervised Feature Extraction Applied to Prostate Cancer Multiomics Data
title_sort tensor-decomposition-based unsupervised feature extraction applied to prostate cancer multiomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7763286/
https://www.ncbi.nlm.nih.gov/pubmed/33322492
http://dx.doi.org/10.3390/genes11121493
work_keys_str_mv AT taguchiyh tensordecompositionbasedunsupervisedfeatureextractionappliedtoprostatecancermultiomicsdata
AT turkiturki tensordecompositionbasedunsupervisedfeatureextractionappliedtoprostatecancermultiomicsdata