Cargando…

Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients

Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learnin...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yunru, Pfeiffer, Shannon M., Zhang, Qing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432138/
https://www.ncbi.nlm.nih.gov/pubmed/37593720
http://dx.doi.org/10.1016/j.csbj.2023.07.036
_version_ 1785091337598533632
author Huang, Yunru
Pfeiffer, Shannon M.
Zhang, Qing
author_facet Huang, Yunru
Pfeiffer, Shannon M.
Zhang, Qing
author_sort Huang, Yunru
collection PubMed
description Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes.
format Online
Article
Text
id pubmed-10432138
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-104321382023-08-17 Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients Huang, Yunru Pfeiffer, Shannon M. Zhang, Qing Comput Struct Biotechnol J Research Article Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes. Research Network of Computational and Structural Biotechnology 2023-07-26 /pmc/articles/PMC10432138/ /pubmed/37593720 http://dx.doi.org/10.1016/j.csbj.2023.07.036 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Huang, Yunru
Pfeiffer, Shannon M.
Zhang, Qing
Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
title Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
title_full Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
title_fullStr Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
title_full_unstemmed Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
title_short Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
title_sort primary tumor type prediction based on us nationwide genomic profiling data in 13,522 patients
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432138/
https://www.ncbi.nlm.nih.gov/pubmed/37593720
http://dx.doi.org/10.1016/j.csbj.2023.07.036
work_keys_str_mv AT huangyunru primarytumortypepredictionbasedonusnationwidegenomicprofilingdatain13522patients
AT pfeiffershannonm primarytumortypepredictionbasedonusnationwidegenomicprofilingdatain13522patients
AT zhangqing primarytumortypepredictionbasedonusnationwidegenomicprofilingdatain13522patients