Cargando…

Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models

BACKGROUND: Early diagnosis of liver metastasis is of great importance for enhancing the survival of colorectal adenocarcinoma (CAD) patients, and the combined use of a single biomarker in a classier model has shown great improvement in predicting the metastasis of several types of cancers. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Shuwen, Han, Xi, Yang, Qing, Zhou, Jing, Zhuang, Wei, Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7520257/
http://dx.doi.org/10.1002/cam4.3289
_version_ 1783587747026436096
author Shuwen, Han
Xi, Yang
Qing, Zhou
Jing, Zhuang
Wei, Wu
author_facet Shuwen, Han
Xi, Yang
Qing, Zhou
Jing, Zhuang
Wei, Wu
author_sort Shuwen, Han
collection PubMed
description BACKGROUND: Early diagnosis of liver metastasis is of great importance for enhancing the survival of colorectal adenocarcinoma (CAD) patients, and the combined use of a single biomarker in a classier model has shown great improvement in predicting the metastasis of several types of cancers. However, it is little reported for CAD. This study therefore aimed to screen an optimal classier model of CAD with liver metastasis and explore the metastatic mechanisms of genes when applying this classier model. METHODS: The differentially expressed genes between primary CAD samples and CAD with metastasis samples were screened from the Moffitt Cancer Center (MCC) dataset GSE131418. The classification performances of six selected algorithms, namely, LR, RF, SVM, GBDT, NN, and CatBoost, for classification of CAD with liver metastasis samples were compared using the MCC dataset GSE131418 by detecting their classification test accuracy. In addition, the consortium datasets of GSE131418 and GSE81558 were used as internal and external validation sets to screen the optimal method. Subsequently, functional analyses and a drug‐targeted network construction of the feature genes when applying the optimal method were conducted. RESULTS: The optimal CatBoost model with the highest accuracy of 99%, and an area under the curve of 1, was screened, which consisted of 33 feature genes. A functional analysis showed that the feature genes were closely associated with a “steroid metabolic process” and “lipoprotein particle receptor binding” (eg APOB and APOC3). In addition, the feature genes were significantly enriched in the “complement and coagulation cascade” pathways (eg FGA, F2, and F9). In a drug‐target interaction network, F2 and F9 were predicted as targets of menadione. CONCLUSION: The CatBoost model constructed using 33 feature genes showed the optimal classification performance for identifying CAD with liver metastasis.
format Online
Article
Text
id pubmed-7520257
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-75202572020-09-30 Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models Shuwen, Han Xi, Yang Qing, Zhou Jing, Zhuang Wei, Wu Cancer Med Clinical Cancer Research BACKGROUND: Early diagnosis of liver metastasis is of great importance for enhancing the survival of colorectal adenocarcinoma (CAD) patients, and the combined use of a single biomarker in a classier model has shown great improvement in predicting the metastasis of several types of cancers. However, it is little reported for CAD. This study therefore aimed to screen an optimal classier model of CAD with liver metastasis and explore the metastatic mechanisms of genes when applying this classier model. METHODS: The differentially expressed genes between primary CAD samples and CAD with metastasis samples were screened from the Moffitt Cancer Center (MCC) dataset GSE131418. The classification performances of six selected algorithms, namely, LR, RF, SVM, GBDT, NN, and CatBoost, for classification of CAD with liver metastasis samples were compared using the MCC dataset GSE131418 by detecting their classification test accuracy. In addition, the consortium datasets of GSE131418 and GSE81558 were used as internal and external validation sets to screen the optimal method. Subsequently, functional analyses and a drug‐targeted network construction of the feature genes when applying the optimal method were conducted. RESULTS: The optimal CatBoost model with the highest accuracy of 99%, and an area under the curve of 1, was screened, which consisted of 33 feature genes. A functional analysis showed that the feature genes were closely associated with a “steroid metabolic process” and “lipoprotein particle receptor binding” (eg APOB and APOC3). In addition, the feature genes were significantly enriched in the “complement and coagulation cascade” pathways (eg FGA, F2, and F9). In a drug‐target interaction network, F2 and F9 were predicted as targets of menadione. CONCLUSION: The CatBoost model constructed using 33 feature genes showed the optimal classification performance for identifying CAD with liver metastasis. John Wiley and Sons Inc. 2020-07-24 /pmc/articles/PMC7520257/ http://dx.doi.org/10.1002/cam4.3289 Text en © 2020 The Authors. Cancer Medicine published by John Wiley & Sons Ltd This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Clinical Cancer Research
Shuwen, Han
Xi, Yang
Qing, Zhou
Jing, Zhuang
Wei, Wu
Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
title Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
title_full Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
title_fullStr Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
title_full_unstemmed Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
title_short Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
title_sort predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
topic Clinical Cancer Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7520257/
http://dx.doi.org/10.1002/cam4.3289
work_keys_str_mv AT shuwenhan predictingbiomarkersfromclassifierforlivermetastasisofcolorectaladenocarcinomasusingmachinelearningmodels
AT xiyang predictingbiomarkersfromclassifierforlivermetastasisofcolorectaladenocarcinomasusingmachinelearningmodels
AT qingzhou predictingbiomarkersfromclassifierforlivermetastasisofcolorectaladenocarcinomasusingmachinelearningmodels
AT jingzhuang predictingbiomarkersfromclassifierforlivermetastasisofcolorectaladenocarcinomasusingmachinelearningmodels
AT weiwu predictingbiomarkersfromclassifierforlivermetastasisofcolorectaladenocarcinomasusingmachinelearningmodels