Cargando…
Functional and embedding feature analysis for pan-cancer classification
With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9559388/ https://www.ncbi.nlm.nih.gov/pubmed/36248961 http://dx.doi.org/10.3389/fonc.2022.979336 |
_version_ | 1784807640021336064 |
---|---|
author | Lu, Jian Li, JiaRui Ren, Jingxin Ding, Shijian Zeng, Zhenbing Huang, Tao Cai, Yu-Dong |
author_facet | Lu, Jian Li, JiaRui Ren, Jingxin Ding, Shijian Zeng, Zhenbing Huang, Tao Cai, Yu-Dong |
author_sort | Lu, Jian |
collection | PubMed |
description | With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers. |
format | Online Article Text |
id | pubmed-9559388 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95593882022-10-14 Functional and embedding feature analysis for pan-cancer classification Lu, Jian Li, JiaRui Ren, Jingxin Ding, Shijian Zeng, Zhenbing Huang, Tao Cai, Yu-Dong Front Oncol Oncology With the increasing number of people suffering from cancer, this illness has become a major health problem worldwide. Exploring the biological functions and signaling pathways of carcinogenesis is essential for cancer detection and research. In this study, a mutation dataset for eleven cancer types was first obtained from a web-based resource called cBioPortal for Cancer Genomics, followed by extracting 21,049 features from three aspects: relationship to GO and KEGG (enrichment features), mutated genes learned by word2vec (text features), and protein-protein interaction network analyzed by node2vec (network features). Irrelevant features were then excluded using the Boruta feature filtering method, and the retained relevant features were ranked by four feature selection methods (least absolute shrinkage and selection operator, minimum redundancy maximum relevance, Monte Carlo feature selection and light gradient boosting machine) to generate four feature-ranked lists. Incremental feature selection was used to determine the optimal number of features based on these feature lists to build the optimal classifiers and derive interpretable classification rules. The results of four feature-ranking methods were integrated to identify key functional pathways, such as olfactory transduction (hsa04740) and colorectal cancer (hsa05210), and the roles of these functional pathways in cancers were discussed in reference to literature. Overall, this machine learning-based study revealed the altered biological functions of cancers and provided a reference for the mechanisms of different cancers. Frontiers Media S.A. 2022-09-29 /pmc/articles/PMC9559388/ /pubmed/36248961 http://dx.doi.org/10.3389/fonc.2022.979336 Text en Copyright © 2022 Lu, Li, Ren, Ding, Zeng, Huang and Cai https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Oncology Lu, Jian Li, JiaRui Ren, Jingxin Ding, Shijian Zeng, Zhenbing Huang, Tao Cai, Yu-Dong Functional and embedding feature analysis for pan-cancer classification |
title | Functional and embedding feature analysis for pan-cancer classification |
title_full | Functional and embedding feature analysis for pan-cancer classification |
title_fullStr | Functional and embedding feature analysis for pan-cancer classification |
title_full_unstemmed | Functional and embedding feature analysis for pan-cancer classification |
title_short | Functional and embedding feature analysis for pan-cancer classification |
title_sort | functional and embedding feature analysis for pan-cancer classification |
topic | Oncology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9559388/ https://www.ncbi.nlm.nih.gov/pubmed/36248961 http://dx.doi.org/10.3389/fonc.2022.979336 |
work_keys_str_mv | AT lujian functionalandembeddingfeatureanalysisforpancancerclassification AT lijiarui functionalandembeddingfeatureanalysisforpancancerclassification AT renjingxin functionalandembeddingfeatureanalysisforpancancerclassification AT dingshijian functionalandembeddingfeatureanalysisforpancancerclassification AT zengzhenbing functionalandembeddingfeatureanalysisforpancancerclassification AT huangtao functionalandembeddingfeatureanalysisforpancancerclassification AT caiyudong functionalandembeddingfeatureanalysisforpancancerclassification |