Cargando…
Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712020/ https://www.ncbi.nlm.nih.gov/pubmed/31455831 http://dx.doi.org/10.1038/s41598-019-48769-y |
_version_ | 1783446604539232256 |
---|---|
author | Taninaga, Junichi Nishiyama, Yu Fujibayashi, Kazutoshi Gunji, Toshiaki Sasabe, Noriko Iijima, Kimiko Naito, Toshio |
author_facet | Taninaga, Junichi Nishiyama, Yu Fujibayashi, Kazutoshi Gunji, Toshiaki Sasabe, Noriko Iijima, Kimiko Naito, Toshio |
author_sort | Taninaga, Junichi |
collection | PubMed |
description | A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low risk of developing gastric cancer. We used XGBoost, a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. Longitudinal and comprehensive medical check-up data were collected from 25,942 participants who underwent multiple endoscopies from 2006 to 2017 at a single facility in Japan. The participants were classified into a case group (y = 1) or a control group (y = 0) if gastric cancer was or was not detected, respectively, during a 122-month period. Among 1,431 total participants (89 cases and 1,342 controls), 1,144 (80%) were randomly selected for use in training 10 classification models; the remaining 287 (20%) were used to evaluate the models. The results showed that XGBoost outperformed logistic regression and showed the highest area under the curve value (0.899). Accumulating more data in the facility and performing further analyses including other input variables may help expand the clinical utility. |
format | Online Article Text |
id | pubmed-6712020 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-67120202019-09-13 Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study Taninaga, Junichi Nishiyama, Yu Fujibayashi, Kazutoshi Gunji, Toshiaki Sasabe, Noriko Iijima, Kimiko Naito, Toshio Sci Rep Article A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low risk of developing gastric cancer. We used XGBoost, a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. Longitudinal and comprehensive medical check-up data were collected from 25,942 participants who underwent multiple endoscopies from 2006 to 2017 at a single facility in Japan. The participants were classified into a case group (y = 1) or a control group (y = 0) if gastric cancer was or was not detected, respectively, during a 122-month period. Among 1,431 total participants (89 cases and 1,342 controls), 1,144 (80%) were randomly selected for use in training 10 classification models; the remaining 287 (20%) were used to evaluate the models. The results showed that XGBoost outperformed logistic regression and showed the highest area under the curve value (0.899). Accumulating more data in the facility and performing further analyses including other input variables may help expand the clinical utility. Nature Publishing Group UK 2019-08-27 /pmc/articles/PMC6712020/ /pubmed/31455831 http://dx.doi.org/10.1038/s41598-019-48769-y Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Taninaga, Junichi Nishiyama, Yu Fujibayashi, Kazutoshi Gunji, Toshiaki Sasabe, Noriko Iijima, Kimiko Naito, Toshio Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study |
title | Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study |
title_full | Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study |
title_fullStr | Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study |
title_full_unstemmed | Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study |
title_short | Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study |
title_sort | prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: a case-control study |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712020/ https://www.ncbi.nlm.nih.gov/pubmed/31455831 http://dx.doi.org/10.1038/s41598-019-48769-y |
work_keys_str_mv | AT taninagajunichi predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy AT nishiyamayu predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy AT fujibayashikazutoshi predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy AT gunjitoshiaki predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy AT sasabenoriko predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy AT iijimakimiko predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy AT naitotoshio predictionoffuturegastriccancerriskusingamachinelearningalgorithmandcomprehensivemedicalcheckupdataacasecontrolstudy |