Cargando…

Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study

BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optima...

Descripción completa

Detalles Bibliográficos
Autores principales: Zong, Nansu, Ngo, Victoria, Stone, Daniel J, Wen, Andrew, Zhao, Yiqing, Yu, Yue, Liu, Sijia, Huang, Ming, Wang, Chen, Jiang, Guoqian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188315/
https://www.ncbi.nlm.nih.gov/pubmed/34032581
http://dx.doi.org/10.2196/23586
_version_ 1783705310992531456
author Zong, Nansu
Ngo, Victoria
Stone, Daniel J
Wen, Andrew
Zhao, Yiqing
Yu, Yue
Liu, Sijia
Huang, Ming
Wang, Chen
Jiang, Guoqian
author_facet Zong, Nansu
Ngo, Victoria
Stone, Daniel J
Wen, Andrew
Zhao, Yiqing
Yu, Yue
Liu, Sijia
Huang, Ming
Wang, Chen
Jiang, Guoqian
author_sort Zong, Nansu
collection PubMed
description BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. METHODS: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic’s electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. RESULTS: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer.
format Online
Article
Text
id pubmed-8188315
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-81883152021-06-28 Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study Zong, Nansu Ngo, Victoria Stone, Daniel J Wen, Andrew Zhao, Yiqing Yu, Yue Liu, Sijia Huang, Ming Wang, Chen Jiang, Guoqian JMIR Med Inform Original Paper BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. METHODS: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic’s electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. RESULTS: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer. JMIR Publications 2021-05-25 /pmc/articles/PMC8188315/ /pubmed/34032581 http://dx.doi.org/10.2196/23586 Text en ©Nansu Zong, Victoria Ngo, Daniel J Stone, Andrew Wen, Yiqing Zhao, Yue Yu, Sijia Liu, Ming Huang, Chen Wang, Guoqian Jiang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 25.05.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Zong, Nansu
Ngo, Victoria
Stone, Daniel J
Wen, Andrew
Zhao, Yiqing
Yu, Yue
Liu, Sijia
Huang, Ming
Wang, Chen
Jiang, Guoqian
Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
title Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
title_full Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
title_fullStr Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
title_full_unstemmed Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
title_short Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study
title_sort leveraging genetic reports and electronic health records for the prediction of primary cancers: algorithm development and validation study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8188315/
https://www.ncbi.nlm.nih.gov/pubmed/34032581
http://dx.doi.org/10.2196/23586
work_keys_str_mv AT zongnansu leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT ngovictoria leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT stonedanielj leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT wenandrew leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT zhaoyiqing leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT yuyue leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT liusijia leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT huangming leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT wangchen leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy
AT jiangguoqian leveraginggeneticreportsandelectronichealthrecordsforthepredictionofprimarycancersalgorithmdevelopmentandvalidationstudy