Cargando…

Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors

BACKGROUND: Gastric cancer is one of the leading causes of death worldwide. Screening for gastric cancer greatly relies on endoscopy and pathology biopsy, which are invasive and pose financial burdens. Thus, the prevention of the disease by modifying lifestyle-related behaviors and dietary habits or...

Descripción completa

Detalles Bibliográficos
Autores principales: Afrash, Mohammad Reza, Shafiee, Mohsen, Kazemi-Arpanahi, Hadi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9832798/
https://www.ncbi.nlm.nih.gov/pubmed/36627564
http://dx.doi.org/10.1186/s12876-022-02626-x
_version_ 1784868128448053248
author Afrash, Mohammad Reza
Shafiee, Mohsen
Kazemi-Arpanahi, Hadi
author_facet Afrash, Mohammad Reza
Shafiee, Mohsen
Kazemi-Arpanahi, Hadi
author_sort Afrash, Mohammad Reza
collection PubMed
description BACKGROUND: Gastric cancer is one of the leading causes of death worldwide. Screening for gastric cancer greatly relies on endoscopy and pathology biopsy, which are invasive and pose financial burdens. Thus, the prevention of the disease by modifying lifestyle-related behaviors and dietary habits or even the prevention of risk factor formation is of great importance. This study aimed to construct an inexpensive, non-invasive, fast, and high-precision diagnostic model using six machine learning (ML) algorithms to classify patients at high or low risk of developing gastric cancer by analyzing individual lifestyle factors. METHODS: This retrospective study used the data of 2029 individuals from the gastric cancer database of Ayatollah Taleghani Hospital in Abadan City, Iran. The data were randomly separated into training and test sets (ratio 0.7:0.3). Six  ML methods, including multilayer perceptron (MLP), support vector machine (SVM) (linear kernel), SVM (RBF kernel), k-nearest neighbors (KNN) (K = 1, 3, 7, 9), random forest (RF), and eXtreme Gradient Boosting (XGBoost), were trained to construct prognostic models before and after performing the relief feature selection method. Finally, to evaluate the models’ performance, the metrics derived from the confusion matrix were calculated via a test split and cross-validation. RESULTS: This study found 11 important influence factors for the risk of gastric cancer, such as Helicobacter pylori infection, high salt intake, and chronic atrophic gastritis, among other factors. Comparisons indicated that the XGBoost had the best performance for the risk prediction of gastric cancer. CONCLUSIONS: The results suggest that based on simple baseline patient data, the ML techniques have the potential to start the prescreening of gastric cancer and identify high-risk individuals who should proceed with invasive examinations. Our model could also considerably lessen the number of cases that need endoscopic surveillance. Future studies are required to validate the efficacy of the models in a larger and multicenter population.
format Online
Article
Text
id pubmed-9832798
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-98327982023-01-12 Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors Afrash, Mohammad Reza Shafiee, Mohsen Kazemi-Arpanahi, Hadi BMC Gastroenterol Research BACKGROUND: Gastric cancer is one of the leading causes of death worldwide. Screening for gastric cancer greatly relies on endoscopy and pathology biopsy, which are invasive and pose financial burdens. Thus, the prevention of the disease by modifying lifestyle-related behaviors and dietary habits or even the prevention of risk factor formation is of great importance. This study aimed to construct an inexpensive, non-invasive, fast, and high-precision diagnostic model using six machine learning (ML) algorithms to classify patients at high or low risk of developing gastric cancer by analyzing individual lifestyle factors. METHODS: This retrospective study used the data of 2029 individuals from the gastric cancer database of Ayatollah Taleghani Hospital in Abadan City, Iran. The data were randomly separated into training and test sets (ratio 0.7:0.3). Six  ML methods, including multilayer perceptron (MLP), support vector machine (SVM) (linear kernel), SVM (RBF kernel), k-nearest neighbors (KNN) (K = 1, 3, 7, 9), random forest (RF), and eXtreme Gradient Boosting (XGBoost), were trained to construct prognostic models before and after performing the relief feature selection method. Finally, to evaluate the models’ performance, the metrics derived from the confusion matrix were calculated via a test split and cross-validation. RESULTS: This study found 11 important influence factors for the risk of gastric cancer, such as Helicobacter pylori infection, high salt intake, and chronic atrophic gastritis, among other factors. Comparisons indicated that the XGBoost had the best performance for the risk prediction of gastric cancer. CONCLUSIONS: The results suggest that based on simple baseline patient data, the ML techniques have the potential to start the prescreening of gastric cancer and identify high-risk individuals who should proceed with invasive examinations. Our model could also considerably lessen the number of cases that need endoscopic surveillance. Future studies are required to validate the efficacy of the models in a larger and multicenter population. BioMed Central 2023-01-10 /pmc/articles/PMC9832798/ /pubmed/36627564 http://dx.doi.org/10.1186/s12876-022-02626-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Afrash, Mohammad Reza
Shafiee, Mohsen
Kazemi-Arpanahi, Hadi
Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
title Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
title_full Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
title_fullStr Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
title_full_unstemmed Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
title_short Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
title_sort establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9832798/
https://www.ncbi.nlm.nih.gov/pubmed/36627564
http://dx.doi.org/10.1186/s12876-022-02626-x
work_keys_str_mv AT afrashmohammadreza establishingmachinelearningmodelstopredicttheearlyriskofgastriccancerbasedonlifestylefactors
AT shafieemohsen establishingmachinelearningmodelstopredicttheearlyriskofgastriccancerbasedonlifestylefactors
AT kazemiarpanahihadi establishingmachinelearningmodelstopredicttheearlyriskofgastriccancerbasedonlifestylefactors