Cargando…
Sparse data embedding and prediction by tropical matrix factorization
BACKGROUND: Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7908717/ https://www.ncbi.nlm.nih.gov/pubmed/33632116 http://dx.doi.org/10.1186/s12859-021-04023-9 |
_version_ | 1783655776701644800 |
---|---|
author | Omanović, Amra Kazan, Hilal Oblak, Polona Curk, Tomaž |
author_facet | Omanović, Amra Kazan, Hilal Oblak, Polona Curk, Tomaž |
author_sort | Omanović, Amra |
collection | PubMed |
description | BACKGROUND: Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values in sparse data. RESULTS: We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. CONCLUSION: STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra. |
format | Online Article Text |
id | pubmed-7908717 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-79087172021-02-26 Sparse data embedding and prediction by tropical matrix factorization Omanović, Amra Kazan, Hilal Oblak, Polona Curk, Tomaž BMC Bioinformatics Research Article BACKGROUND: Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values in sparse data. RESULTS: We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. CONCLUSION: STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra. BioMed Central 2021-02-25 /pmc/articles/PMC7908717/ /pubmed/33632116 http://dx.doi.org/10.1186/s12859-021-04023-9 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Omanović, Amra Kazan, Hilal Oblak, Polona Curk, Tomaž Sparse data embedding and prediction by tropical matrix factorization |
title | Sparse data embedding and prediction by tropical matrix factorization |
title_full | Sparse data embedding and prediction by tropical matrix factorization |
title_fullStr | Sparse data embedding and prediction by tropical matrix factorization |
title_full_unstemmed | Sparse data embedding and prediction by tropical matrix factorization |
title_short | Sparse data embedding and prediction by tropical matrix factorization |
title_sort | sparse data embedding and prediction by tropical matrix factorization |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7908717/ https://www.ncbi.nlm.nih.gov/pubmed/33632116 http://dx.doi.org/10.1186/s12859-021-04023-9 |
work_keys_str_mv | AT omanovicamra sparsedataembeddingandpredictionbytropicalmatrixfactorization AT kazanhilal sparsedataembeddingandpredictionbytropicalmatrixfactorization AT oblakpolona sparsedataembeddingandpredictionbytropicalmatrixfactorization AT curktomaz sparsedataembeddingandpredictionbytropicalmatrixfactorization |