Cargando…
Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study
A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712020/ https://www.ncbi.nlm.nih.gov/pubmed/31455831 http://dx.doi.org/10.1038/s41598-019-48769-y |
Sumario: | A comprehensive screening method using machine learning and many factors (biological characteristics, Helicobacter pylori infection status, endoscopic findings and blood test results), accumulated daily as data in hospitals, could improve the accuracy of screening to classify patients at high or low risk of developing gastric cancer. We used XGBoost, a classification method known for achieving numerous winning solutions in data analysis competitions, to capture nonlinear relations among many input variables and outcomes using the boosting approach to machine learning. Longitudinal and comprehensive medical check-up data were collected from 25,942 participants who underwent multiple endoscopies from 2006 to 2017 at a single facility in Japan. The participants were classified into a case group (y = 1) or a control group (y = 0) if gastric cancer was or was not detected, respectively, during a 122-month period. Among 1,431 total participants (89 cases and 1,342 controls), 1,144 (80%) were randomly selected for use in training 10 classification models; the remaining 287 (20%) were used to evaluate the models. The results showed that XGBoost outperformed logistic regression and showed the highest area under the curve value (0.899). Accumulating more data in the facility and performing further analyses including other input variables may help expand the clinical utility. |
---|