Cargando…

Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study

A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low...

Descripción completa

Detalles Bibliográficos
Autores principales: Taninaga, Junichi, Nishiyama, Yu, Fujibayashi, Kazutoshi, Gunji, Toshiaki, Sasabe, Noriko, Iijima, Kimiko, Naito, Toshio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712020/
https://www.ncbi.nlm.nih.gov/pubmed/31455831
http://dx.doi.org/10.1038/s41598-019-48769-y
_version_ 1783446604539232256
author Taninaga, Junichi
Nishiyama, Yu
Fujibayashi, Kazutoshi
Gunji, Toshiaki
Sasabe, Noriko
Iijima, Kimiko
Naito, Toshio
author_facet Taninaga, Junichi
Nishiyama, Yu
Fujibayashi, Kazutoshi
Gunji, Toshiaki
Sasabe, Noriko
Iijima, Kimiko
Naito, Toshio
author_sort Taninaga, Junichi
collection PubMed
description A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low risk of developing gastric cancer. We used XGBoost, a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. Longitudinal and comprehensive medical check-up data were collected from 25,942 participants who underwent multiple endoscopies from 2006 to 2017 at a single facility in Japan. The participants were classified into a case group (y = 1) or a control group (y = 0) if gastric cancer was or was not detected, respectively, during a 122-month period. Among 1,431 total participants (89 cases and 1,342 controls), 1,144 (80%) were randomly selected for use in training 10 classification models; the remaining 287 (20%) were used to evaluate the models. The results showed that XGBoost outperformed logistic regression and showed the highest area under the curve value (0.899). Accumulating more data in the facility and performing further analyses including other input variables may help expand the clinical utility.
format Online
Article
Text
id pubmed-6712020
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-67120202019-09-13 Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study Taninaga, Junichi Nishiyama, Yu Fujibayashi, Kazutoshi Gunji, Toshiaki Sasabe, Noriko Iijima, Kimiko Naito, Toshio Sci Rep Article A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low risk of developing gastric cancer. We used XGBoost, a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. Longitudinal and comprehensive medical check-up data were collected from 25,942 participants who underwent multiple endoscopies from 2006 to 2017 at a single facility in Japan. The participants were classified into a case group (y = 1) or a control group (y = 0) if gastric cancer was or was not detected, respectively, during a 122-month period. Among 1,431 total participants (89 cases and 1,342 controls), 1,144 (80%) were randomly selected for use in training 10 classification models; the remaining 287 (20%) were used to evaluate the models. The results showed that XGBoost outperformed logistic regression and showed the highest area under the curve value (0.899). Accumulating more data in the facility and performing further analyses including other input variables may help expand the clinical utility. Nature Publishing Group UK 2019-08-27 /pmc/articles/PMC6712020/ /pubmed/31455831 http://dx.doi.org/10.1038/s41598-019-48769-y Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Taninaga, Junichi
Nishiyama, Yu
Fujibayashi, Kazutoshi
Gunji, Toshiaki
Sasabe, Noriko
Iijima, Kimiko
Naito, Toshio
Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
title Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
title_full Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
title_fullStr Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
title_full_unstemmed Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
title_short Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
title_sort prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: a case-control study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712020/
https://www.ncbi.nlm.nih.gov/pubmed/31455831
http://dx.doi.org/10.1038/s41598-019-48769-y
work_keys_str_mv AT taninagajunichi predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy
AT nishiyamayu predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy
AT fujibayashikazutoshi predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy
AT gunjitoshiaki predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy
AT sasabenoriko predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy
AT iijimakimiko predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy
AT naitotoshio predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy