Cargando…

3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection

In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penaliz...

Descripción completa

Detalles Bibliográficos
Autores principales: Mallik, Saurav, Sarkar, Anasua, Nath, Sagnik, Maulik, Ujjwal, Das, Supantha, Pati, Soumen Kumar, Ghosh, Soumadip, Zhao, Zhongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9971618/
https://www.ncbi.nlm.nih.gov/pubmed/36865387
http://dx.doi.org/10.3389/fgene.2023.1095330
_version_ 1784898136698781696
author Mallik, Saurav
Sarkar, Anasua
Nath, Sagnik
Maulik, Ujjwal
Das, Supantha
Pati, Soumen Kumar
Ghosh, Soumadip
Zhao, Zhongming
author_facet Mallik, Saurav
Sarkar, Anasua
Nath, Sagnik
Maulik, Ujjwal
Das, Supantha
Pati, Soumen Kumar
Ghosh, Soumadip
Zhao, Zhongming
author_sort Mallik, Saurav
collection PubMed
description In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.
format Online
Article
Text
id pubmed-9971618
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-99716182023-03-01 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection Mallik, Saurav Sarkar, Anasua Nath, Sagnik Maulik, Ujjwal Das, Supantha Pati, Soumen Kumar Ghosh, Soumadip Zhao, Zhongming Front Genet Genetics In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery. Frontiers Media S.A. 2023-02-14 /pmc/articles/PMC9971618/ /pubmed/36865387 http://dx.doi.org/10.3389/fgene.2023.1095330 Text en Copyright © 2023 Mallik, Sarkar, Nath, Maulik, Das, Pati, Ghosh and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Mallik, Saurav
Sarkar, Anasua
Nath, Sagnik
Maulik, Ujjwal
Das, Supantha
Pati, Soumen Kumar
Ghosh, Soumadip
Zhao, Zhongming
3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
title 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
title_full 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
title_fullStr 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
title_full_unstemmed 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
title_short 3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
title_sort 3pnmf-mkl: a non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9971618/
https://www.ncbi.nlm.nih.gov/pubmed/36865387
http://dx.doi.org/10.3389/fgene.2023.1095330
work_keys_str_mv AT malliksaurav 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT sarkaranasua 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT nathsagnik 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT maulikujjwal 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT dassupantha 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT patisoumenkumar 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT ghoshsoumadip 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection
AT zhaozhongming 3pnmfmklanonnegativematrixfactorizationbasedmultiplekernellearningmethodformultimodaldataintegrationanditsapplicationtogenesignaturedetection