Cargando…
Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optima...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188315/ https://www.ncbi.nlm.nih.gov/pubmed/34032581 http://dx.doi.org/10.2196/23586 |
_version_ | 1783705310992531456 |
---|---|
author | Zong, Nansu Ngo, Victoria Stone, Daniel J Wen, Andrew Zhao, Yiqing Yu, Yue Liu, Sijia Huang, Ming Wang, Chen Jiang, Guoqian |
author_facet | Zong, Nansu Ngo, Victoria Stone, Daniel J Wen, Andrew Zhao, Yiqing Yu, Yue Liu, Sijia Huang, Ming Wang, Chen Jiang, Guoqian |
author_sort | Zong, Nansu |
collection | PubMed |
description | BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. METHODS: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic’s electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. RESULTS: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer. |
format | Online Article Text |
id | pubmed-8188315 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-81883152021-06-28 Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study Zong, Nansu Ngo, Victoria Stone, Daniel J Wen, Andrew Zhao, Yiqing Yu, Yue Liu, Sijia Huang, Ming Wang, Chen Jiang, Guoqian JMIR Med Inform Original Paper BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. METHODS: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic’s electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. RESULTS: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer. JMIR Publications 2021-05-25 /pmc/articles/PMC8188315/ /pubmed/34032581 http://dx.doi.org/10.2196/23586 Text en ©Nansu Zong, Victoria Ngo, Daniel J Stone, Andrew Wen, Yiqing Zhao, Yue Yu, Sijia Liu, Ming Huang, Chen Wang, Guoqian Jiang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.05.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Zong, Nansu Ngo, Victoria Stone, Daniel J Wen, Andrew Zhao, Yiqing Yu, Yue Liu, Sijia Huang, Ming Wang, Chen Jiang, Guoqian Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study |
title | Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study |
title_full | Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study |
title_fullStr | Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study |
title_full_unstemmed | Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study |
title_short | Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study |
title_sort | leveraging genetic reports and electronic health records for the prediction of primary cancers: algorithm development and validation study |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188315/ https://www.ncbi.nlm.nih.gov/pubmed/34032581 http://dx.doi.org/10.2196/23586 |
work_keys_str_mv | AT zongnansu leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT ngovictoria leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT stonedanielj leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT wenandrew leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT zhaoyiqing leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT yuyue leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT liusijia leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT huangming leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT wangchen leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy AT jiangguoqian leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy |