Cargando…
Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients
Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learnin...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432138/ https://www.ncbi.nlm.nih.gov/pubmed/37593720 http://dx.doi.org/10.1016/j.csbj.2023.07.036 |
_version_ | 1785091337598533632 |
---|---|
author | Huang, Yunru Pfeiffer, Shannon M. Zhang, Qing |
author_facet | Huang, Yunru Pfeiffer, Shannon M. Zhang, Qing |
author_sort | Huang, Yunru |
collection | PubMed |
description | Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes. |
format | Online Article Text |
id | pubmed-10432138 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-104321382023-08-17 Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients Huang, Yunru Pfeiffer, Shannon M. Zhang, Qing Comput Struct Biotechnol J Research Article Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes. Research Network of Computational and Structural Biotechnology 2023-07-26 /pmc/articles/PMC10432138/ /pubmed/37593720 http://dx.doi.org/10.1016/j.csbj.2023.07.036 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Huang, Yunru Pfeiffer, Shannon M. Zhang, Qing Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients |
title | Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients |
title_full | Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients |
title_fullStr | Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients |
title_full_unstemmed | Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients |
title_short | Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients |
title_sort | primary tumor type prediction based on us nationwide genomic profiling data in 13,522 patients |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432138/ https://www.ncbi.nlm.nih.gov/pubmed/37593720 http://dx.doi.org/10.1016/j.csbj.2023.07.036 |
work_keys_str_mv | AT huangyunru primarytumortypepredictionbasedonusnationwidegenomicprofilingdatain13522patients AT pfeiffershannonm primarytumortypepredictionbasedonusnationwidegenomicprofilingdatain13522patients AT zhangqing primarytumortypepredictionbasedonusnationwidegenomicprofilingdatain13522patients |