Cargando…

Development of a Machine Learning Model to Predict Recurrence of Oral Tongue Squamous Cell Carcinoma

SIMPLE SUMMARY: In this study, we developed a generic framework to analyze the Surveillance, Epidemiology, and End Results (SEER) database to generate reliable machine learning (ML) prediction models for cancer recurrence. As a proof-of-concept, using 130,979 oral tongue squamous cell carcinoma pati...

Descripción completa

Detalles Bibliográficos
Autores principales: Fatapour, Yasaman, Abiri, Arash, Kuan, Edward C., Brody, James P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10216090/
https://www.ncbi.nlm.nih.gov/pubmed/37345106
http://dx.doi.org/10.3390/cancers15102769
Descripción
Sumario:SIMPLE SUMMARY: In this study, we developed a generic framework to analyze the Surveillance, Epidemiology, and End Results (SEER) database to generate reliable machine learning (ML) prediction models for cancer recurrence. As a proof-of-concept, using 130,979 oral tongue squamous cell carcinoma patients, we generated ML models to predict 5- and 10-year recurrence with high accuracy, recall, and precision. Thus, we demonstrate an effective framework for guiding future ML efforts in predicting cancer recurrence using the SEER database, with implications for the guidance of patient management and follow-up care. ABSTRACT: Despite diagnostic advancements, the development of reliable prognostic systems for assessing the risk of cancer recurrence still remains a challenge. In this study, we developed a novel framework to generate highly representative machine-learning prediction models for oral tongue squamous cell carcinoma (OTSCC) cancer recurrence. We identified cases of 5- and 10-year OTSCC recurrence from the SEER database. Four classification models were trained using the H(2)O ai platform, whose performances were assessed according to their accuracy, recall, precision, and the area under the curve (AUC) of their receiver operating characteristic (ROC) curves. By evaluating Shapley additive explanation contribution plots, feature importance was studied. Of the 130,979 patients studied, 36,042 (27.5%) were female, and the mean (SD) age was 58.2 (13.7) years. The Gradient Boosting Machine model performed the best, achieving 81.8% accuracy and 97.7% precision for 5-year prediction. Moreover, 10-year predictions demonstrated 80.0% accuracy and 94.0% precision. The number of prior tumors, patient age, the site of cancer recurrence, and tumor histology were the most significant predictors. The implementation of our novel SEER framework enabled the successful identification of patients with OTSCC recurrence, with which highly accurate and sensitive prediction models were generated. Thus, we demonstrate our framework’s potential for application in various cancers to build generalizable screening tools to predict tumor recurrence.