Cargando…

SMG: self-supervised masked graph learning for cancer gene identification

Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recen...

Descripción completa

Detalles Bibliográficos
Autores principales: Cui, Yan, Wang, Zhikang, Wang, Xiaoyu, Zhang, Yiwen, Zhang, Ying, Pan, Tong, Zhang, Zhe, Li, Shanshan, Guo, Yuming, Akutsu, Tatsuya, Song, Jiangning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10639095/
https://www.ncbi.nlm.nih.gov/pubmed/37950905
http://dx.doi.org/10.1093/bib/bbad406
_version_ 1785133724542697472
author Cui, Yan
Wang, Zhikang
Wang, Xiaoyu
Zhang, Yiwen
Zhang, Ying
Pan, Tong
Zhang, Zhe
Li, Shanshan
Guo, Yuming
Akutsu, Tatsuya
Song, Jiangning
author_facet Cui, Yan
Wang, Zhikang
Wang, Xiaoyu
Zhang, Yiwen
Zhang, Ying
Pan, Tong
Zhang, Zhe
Li, Shanshan
Guo, Yuming
Akutsu, Tatsuya
Song, Jiangning
author_sort Cui, Yan
collection PubMed
description Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein–protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.
format Online
Article
Text
id pubmed-10639095
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106390952023-11-11 SMG: self-supervised masked graph learning for cancer gene identification Cui, Yan Wang, Zhikang Wang, Xiaoyu Zhang, Yiwen Zhang, Ying Pan, Tong Zhang, Zhe Li, Shanshan Guo, Yuming Akutsu, Tatsuya Song, Jiangning Brief Bioinform Problem Solving Protocol Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein–protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering. Oxford University Press 2023-11-08 /pmc/articles/PMC10639095/ /pubmed/37950905 http://dx.doi.org/10.1093/bib/bbad406 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Cui, Yan
Wang, Zhikang
Wang, Xiaoyu
Zhang, Yiwen
Zhang, Ying
Pan, Tong
Zhang, Zhe
Li, Shanshan
Guo, Yuming
Akutsu, Tatsuya
Song, Jiangning
SMG: self-supervised masked graph learning for cancer gene identification
title SMG: self-supervised masked graph learning for cancer gene identification
title_full SMG: self-supervised masked graph learning for cancer gene identification
title_fullStr SMG: self-supervised masked graph learning for cancer gene identification
title_full_unstemmed SMG: self-supervised masked graph learning for cancer gene identification
title_short SMG: self-supervised masked graph learning for cancer gene identification
title_sort smg: self-supervised masked graph learning for cancer gene identification
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10639095/
https://www.ncbi.nlm.nih.gov/pubmed/37950905
http://dx.doi.org/10.1093/bib/bbad406
work_keys_str_mv AT cuiyan smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT wangzhikang smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT wangxiaoyu smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT zhangyiwen smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT zhangying smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT pantong smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT zhangzhe smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT lishanshan smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT guoyuming smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT akutsutatsuya smgselfsupervisedmaskedgraphlearningforcancergeneidentification
AT songjiangning smgselfsupervisedmaskedgraphlearningforcancergeneidentification