Cargando…

Functional and embedding feature analysis for pan-cancer classification

With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Jian, Li, JiaRui, Ren, Jingxin, Ding, Shijian, Zeng, Zhenbing, Huang, Tao, Cai, Yu-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9559388/
https://www.ncbi.nlm.nih.gov/pubmed/36248961
http://dx.doi.org/10.3389/fonc.2022.979336
_version_ 1784807640021336064
author Lu, Jian
Li, JiaRui
Ren, Jingxin
Ding, Shijian
Zeng, Zhenbing
Huang, Tao
Cai, Yu-Dong
author_facet Lu, Jian
Li, JiaRui
Ren, Jingxin
Ding, Shijian
Zeng, Zhenbing
Huang, Tao
Cai, Yu-Dong
author_sort Lu, Jian
collection PubMed
description With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers.
format Online
Article
Text
id pubmed-9559388
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95593882022-10-14 Functional and embedding feature analysis for pan-cancer classification Lu, Jian Li, JiaRui Ren, Jingxin Ding, Shijian Zeng, Zhenbing Huang, Tao Cai, Yu-Dong Front Oncol Oncology With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers. Frontiers Media S.A. 2022-09-29 /pmc/articles/PMC9559388/ /pubmed/36248961 http://dx.doi.org/10.3389/fonc.2022.979336 Text en Copyright © 2022 Lu, Li, Ren, Ding, Zeng, Huang and Cai https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Lu, Jian
Li, JiaRui
Ren, Jingxin
Ding, Shijian
Zeng, Zhenbing
Huang, Tao
Cai, Yu-Dong
Functional and embedding feature analysis for pan-cancer classification
title Functional and embedding feature analysis for pan-cancer classification
title_full Functional and embedding feature analysis for pan-cancer classification
title_fullStr Functional and embedding feature analysis for pan-cancer classification
title_full_unstemmed Functional and embedding feature analysis for pan-cancer classification
title_short Functional and embedding feature analysis for pan-cancer classification
title_sort functional and embedding feature analysis for pan-cancer classification
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9559388/
https://www.ncbi.nlm.nih.gov/pubmed/36248961
http://dx.doi.org/10.3389/fonc.2022.979336
work_keys_str_mv AT lujian functionalandembeddingfeatureanalysisforpancancerclassification
AT lijiarui functionalandembeddingfeatureanalysisforpancancerclassification
AT renjingxin functionalandembeddingfeatureanalysisforpancancerclassification
AT dingshijian functionalandembeddingfeatureanalysisforpancancerclassification
AT zengzhenbing functionalandembeddingfeatureanalysisforpancancerclassification
AT huangtao functionalandembeddingfeatureanalysisforpancancerclassification
AT caiyudong functionalandembeddingfeatureanalysisforpancancerclassification