Cargando…

Application of data mining methods to improve screening for the risk of early gastric cancer

BACKGROUND: Although gastric cancer is a malignancy with high morbidity and mortality in China, the survival rate of patients with early gastric cancer (EGC) is high after surgical resection. To strengthen diagnosing and screening is the key to improve the survival and life quality of patients with...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Mi-Mi, Wen, Li, Liu, Yong-Jia, Cai, Qiao, Li, Li-Ting, Cai, Yong-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284275/
https://www.ncbi.nlm.nih.gov/pubmed/30526601
http://dx.doi.org/10.1186/s12911-018-0689-4
_version_ 1783379305865150464
author Liu, Mi-Mi
Wen, Li
Liu, Yong-Jia
Cai, Qiao
Li, Li-Ting
Cai, Yong-Ming
author_facet Liu, Mi-Mi
Wen, Li
Liu, Yong-Jia
Cai, Qiao
Li, Li-Ting
Cai, Yong-Ming
author_sort Liu, Mi-Mi
collection PubMed
description BACKGROUND: Although gastric cancer is a malignancy with high morbidity and mortality in China, the survival rate of patients with early gastric cancer (EGC) is high after surgical resection. To strengthen diagnosing and screening is the key to improve the survival and life quality of patients with EGC. This study applied data mining methods to improve screening for the risk of EGC on the basis of noninvasive factors, and displayed important influence factors for the risk of EGC. METHODS: The dataset was derived from a project of the First Hospital Affiliated Guangdong Pharmaceutical University. A series of questionnaire surveys, serological examinations and endoscopy plus pathology biopsy were conducted in 618 patients with gastric diseases. Their risk of EGC was categorized into low and high risk of EGC by the results of endoscopy plus pathology biopsy. The synthetic minority oversampling technique (SMOTE) was used to solve imbalance categories of the risk of EGC. Four classification models of the risk of EGC was established, including logistic regression (LR) and three data mining algorithms. RESULTS: The three data mining models had higher accuracy than the LR model. Gain curves of the three data mining models were convexes more closer to ideal curves by contrast with that of the LR model. AUC of the three data mining models were larger than that of the LR model as well. The three data mining models predicted the risk of EGC more effectively in comparison with the LR model. Moreover, this study found 16 important influence factors for the risk of EGC, such as occupations, helicobacter pylori infection, drinking hot water and so on. CONCLUSIONS: The three data mining models have optimal predictive behaviors over the LR model, therefore can effectively evaluate the risk of EGC and assist clinicians in improving the diagnosis and screening of EGC. Sixteen important influence factors for the risk of EGC were illustrated, which may helpfully assess gastric carcinogenesis, and remind to early prevention and early detection of gastric cancer. This study may also be conducive to clinical researchers in selecting and conducting the optimal predictive models.
format Online
Article
Text
id pubmed-6284275
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62842752018-12-14 Application of data mining methods to improve screening for the risk of early gastric cancer Liu, Mi-Mi Wen, Li Liu, Yong-Jia Cai, Qiao Li, Li-Ting Cai, Yong-Ming BMC Med Inform Decis Mak Research BACKGROUND: Although gastric cancer is a malignancy with high morbidity and mortality in China, the survival rate of patients with early gastric cancer (EGC) is high after surgical resection. To strengthen diagnosing and screening is the key to improve the survival and life quality of patients with EGC. This study applied data mining methods to improve screening for the risk of EGC on the basis of noninvasive factors, and displayed important influence factors for the risk of EGC. METHODS: The dataset was derived from a project of the First Hospital Affiliated Guangdong Pharmaceutical University. A series of questionnaire surveys, serological examinations and endoscopy plus pathology biopsy were conducted in 618 patients with gastric diseases. Their risk of EGC was categorized into low and high risk of EGC by the results of endoscopy plus pathology biopsy. The synthetic minority oversampling technique (SMOTE) was used to solve imbalance categories of the risk of EGC. Four classification models of the risk of EGC was established, including logistic regression (LR) and three data mining algorithms. RESULTS: The three data mining models had higher accuracy than the LR model. Gain curves of the three data mining models were convexes more closer to ideal curves by contrast with that of the LR model. AUC of the three data mining models were larger than that of the LR model as well. The three data mining models predicted the risk of EGC more effectively in comparison with the LR model. Moreover, this study found 16 important influence factors for the risk of EGC, such as occupations, helicobacter pylori infection, drinking hot water and so on. CONCLUSIONS: The three data mining models have optimal predictive behaviors over the LR model, therefore can effectively evaluate the risk of EGC and assist clinicians in improving the diagnosis and screening of EGC. Sixteen important influence factors for the risk of EGC were illustrated, which may helpfully assess gastric carcinogenesis, and remind to early prevention and early detection of gastric cancer. This study may also be conducive to clinical researchers in selecting and conducting the optimal predictive models. BioMed Central 2018-12-07 /pmc/articles/PMC6284275/ /pubmed/30526601 http://dx.doi.org/10.1186/s12911-018-0689-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Mi-Mi
Wen, Li
Liu, Yong-Jia
Cai, Qiao
Li, Li-Ting
Cai, Yong-Ming
Application of data mining methods to improve screening for the risk of early gastric cancer
title Application of data mining methods to improve screening for the risk of early gastric cancer
title_full Application of data mining methods to improve screening for the risk of early gastric cancer
title_fullStr Application of data mining methods to improve screening for the risk of early gastric cancer
title_full_unstemmed Application of data mining methods to improve screening for the risk of early gastric cancer
title_short Application of data mining methods to improve screening for the risk of early gastric cancer
title_sort application of data mining methods to improve screening for the risk of early gastric cancer
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284275/
https://www.ncbi.nlm.nih.gov/pubmed/30526601
http://dx.doi.org/10.1186/s12911-018-0689-4
work_keys_str_mv AT liumimi applicationofdataminingmethodstoimprovescreeningfortheriskofearlygastriccancer
AT wenli applicationofdataminingmethodstoimprovescreeningfortheriskofearlygastriccancer
AT liuyongjia applicationofdataminingmethodstoimprovescreeningfortheriskofearlygastriccancer
AT caiqiao applicationofdataminingmethodstoimprovescreeningfortheriskofearlygastriccancer
AT liliting applicationofdataminingmethodstoimprovescreeningfortheriskofearlygastriccancer
AT caiyongming applicationofdataminingmethodstoimprovescreeningfortheriskofearlygastriccancer