Cargando…

Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data

SIMPLE SUMMARY: Cancers of the gastrointestinal tract—including the esophagus, stomach, and intestines—are often diagnosed at an advanced stage, when curative treatments are rare. These cancers can all cause gastrointestinal bleeding, but this often occurs gradually and may be unnoticed by patients....

Descripción completa

Detalles Bibliográficos
Autores principales: Read, Andrew J., Zhou, Wenjing, Saini, Sameer D., Zhu, Ji, Waljee, Akbar K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10000707/
https://www.ncbi.nlm.nih.gov/pubmed/36900192
http://dx.doi.org/10.3390/cancers15051399
_version_ 1784903945934602240
author Read, Andrew J.
Zhou, Wenjing
Saini, Sameer D.
Zhu, Ji
Waljee, Akbar K.
author_facet Read, Andrew J.
Zhou, Wenjing
Saini, Sameer D.
Zhu, Ji
Waljee, Akbar K.
author_sort Read, Andrew J.
collection PubMed
description SIMPLE SUMMARY: Cancers of the gastrointestinal tract—including the esophagus, stomach, and intestines—are often diagnosed at an advanced stage, when curative treatments are rare. These cancers can all cause gastrointestinal bleeding, but this often occurs gradually and may be unnoticed by patients. Changes in routine laboratory parameters such as the complete blood count may be able to show these subtle changes prior to clinical presentation or the development of iron deficiency anemia. The aim of our study was to develop models for the prediction of luminal gastrointestinal tract cancers (esophageal, gastric, small bowel, colorectal, anal) using data routinely available within an electronic health record, in a retrospective cohort from an academic medical center. The cohort included 148,158 individuals, with 1025 gastrointestinal tract cancers. We found that longitudinal prediction models using the complete blood count outperformed a single timepoint logistic model for 3-year cancer prediction. ABSTRACT: Background: Luminal gastrointestinal (GI) tract cancers, including esophageal, gastric, small bowel, colorectal, and anal cancers, are often diagnosed at late stages. These tumors can cause gradual GI bleeding, which may be unrecognized but detectable by subtle laboratory changes. Our aim was to develop models to predict luminal GI tract cancers using laboratory studies and patient characteristics using logistic regression and random forest machine learning methods. Methods: The study was a single-center, retrospective cohort at an academic medical center, with enrollment between 2004–2013 and with follow-up until 2018, who had at least two complete blood counts (CBCs). The primary outcome was the diagnosis of GI tract cancer. Prediction models were developed using multivariable single timepoint logistic regression, longitudinal logistic regression, and random forest machine learning. Results: The cohort included 148,158 individuals, with 1025 GI tract cancers. For 3-year prediction of GI tract cancers, the longitudinal random forest model performed the best, with an area under the receiver operator curve (AuROC) of 0.750 (95% CI 0.729–0.771) and Brier score of 0.116, compared to the longitudinal logistic regression model, with an AuROC of 0.735 (95% CI 0.713–0.757) and Brier score of 0.205. Conclusions: Prediction models incorporating longitudinal features of the CBC outperformed the single timepoint logistic regression models at 3-years, with a trend toward improved accuracy of prediction using a random forest machine learning model compared to a longitudinal logistic regression model.
format Online
Article
Text
id pubmed-10000707
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100007072023-03-11 Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data Read, Andrew J. Zhou, Wenjing Saini, Sameer D. Zhu, Ji Waljee, Akbar K. Cancers (Basel) Article SIMPLE SUMMARY: Cancers of the gastrointestinal tract—including the esophagus, stomach, and intestines—are often diagnosed at an advanced stage, when curative treatments are rare. These cancers can all cause gastrointestinal bleeding, but this often occurs gradually and may be unnoticed by patients. Changes in routine laboratory parameters such as the complete blood count may be able to show these subtle changes prior to clinical presentation or the development of iron deficiency anemia. The aim of our study was to develop models for the prediction of luminal gastrointestinal tract cancers (esophageal, gastric, small bowel, colorectal, anal) using data routinely available within an electronic health record, in a retrospective cohort from an academic medical center. The cohort included 148,158 individuals, with 1025 gastrointestinal tract cancers. We found that longitudinal prediction models using the complete blood count outperformed a single timepoint logistic model for 3-year cancer prediction. ABSTRACT: Background: Luminal gastrointestinal (GI) tract cancers, including esophageal, gastric, small bowel, colorectal, and anal cancers, are often diagnosed at late stages. These tumors can cause gradual GI bleeding, which may be unrecognized but detectable by subtle laboratory changes. Our aim was to develop models to predict luminal GI tract cancers using laboratory studies and patient characteristics using logistic regression and random forest machine learning methods. Methods: The study was a single-center, retrospective cohort at an academic medical center, with enrollment between 2004–2013 and with follow-up until 2018, who had at least two complete blood counts (CBCs). The primary outcome was the diagnosis of GI tract cancer. Prediction models were developed using multivariable single timepoint logistic regression, longitudinal logistic regression, and random forest machine learning. Results: The cohort included 148,158 individuals, with 1025 GI tract cancers. For 3-year prediction of GI tract cancers, the longitudinal random forest model performed the best, with an area under the receiver operator curve (AuROC) of 0.750 (95% CI 0.729–0.771) and Brier score of 0.116, compared to the longitudinal logistic regression model, with an AuROC of 0.735 (95% CI 0.713–0.757) and Brier score of 0.205. Conclusions: Prediction models incorporating longitudinal features of the CBC outperformed the single timepoint logistic regression models at 3-years, with a trend toward improved accuracy of prediction using a random forest machine learning model compared to a longitudinal logistic regression model. MDPI 2023-02-22 /pmc/articles/PMC10000707/ /pubmed/36900192 http://dx.doi.org/10.3390/cancers15051399 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Read, Andrew J.
Zhou, Wenjing
Saini, Sameer D.
Zhu, Ji
Waljee, Akbar K.
Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
title Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
title_full Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
title_fullStr Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
title_full_unstemmed Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
title_short Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
title_sort prediction of gastrointestinal tract cancers using longitudinal electronic health record data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10000707/
https://www.ncbi.nlm.nih.gov/pubmed/36900192
http://dx.doi.org/10.3390/cancers15051399
work_keys_str_mv AT readandrewj predictionofgastrointestinaltractcancersusinglongitudinalelectronichealthrecorddata
AT zhouwenjing predictionofgastrointestinaltractcancersusinglongitudinalelectronichealthrecorddata
AT sainisameerd predictionofgastrointestinaltractcancersusinglongitudinalelectronichealthrecorddata
AT zhuji predictionofgastrointestinaltractcancersusinglongitudinalelectronichealthrecorddata
AT waljeeakbark predictionofgastrointestinaltractcancersusinglongitudinalelectronichealthrecorddata