Cargando…

Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset

To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precis...

Descripción completa

Detalles Bibliográficos
Autores principales: Kehl, Kenneth L., Xu, Wenxin, Gusev, Alexander, Bakouny, Ziad, Choueiri, Toni K., Riaz, Irbaz Bin, Elmarakeby, Haitham, Van Allen, Eliezer M., Schrag, Deborah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674229/
https://www.ncbi.nlm.nih.gov/pubmed/34911934
http://dx.doi.org/10.1038/s41467-021-27358-6
_version_ 1784615603791724544
author Kehl, Kenneth L.
Xu, Wenxin
Gusev, Alexander
Bakouny, Ziad
Choueiri, Toni K.
Riaz, Irbaz Bin
Elmarakeby, Haitham
Van Allen, Eliezer M.
Schrag, Deborah
author_facet Kehl, Kenneth L.
Xu, Wenxin
Gusev, Alexander
Bakouny, Ziad
Choueiri, Toni K.
Riaz, Irbaz Bin
Elmarakeby, Haitham
Van Allen, Eliezer M.
Schrag, Deborah
author_sort Kehl, Kenneth L.
collection PubMed
description To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precision oncology study. Outcomes are extracted from 305,151 imaging reports for 13,130 patients and 233,517 oncologist notes for 13,511 patients, including patients with 6 additional cancer types. NLP models recapitulate outcome annotation from these documents, including the presence of cancer, progression/worsening, response/improvement, and metastases, with excellent discrimination (AUROC > 0.90). Models generalize to cancers excluded from training and yield outcomes correlated with survival. Among patients receiving checkpoint inhibitors, we confirm that high tumor mutation burden is associated with superior progression-free survival ascertained using NLP. Here, we show that deep NLP can accelerate annotation of molecular cancer datasets with clinically meaningful endpoints to facilitate discovery.
format Online
Article
Text
id pubmed-8674229
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-86742292022-01-04 Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset Kehl, Kenneth L. Xu, Wenxin Gusev, Alexander Bakouny, Ziad Choueiri, Toni K. Riaz, Irbaz Bin Elmarakeby, Haitham Van Allen, Eliezer M. Schrag, Deborah Nat Commun Article To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precision oncology study. Outcomes are extracted from 305,151 imaging reports for 13,130 patients and 233,517 oncologist notes for 13,511 patients, including patients with 6 additional cancer types. NLP models recapitulate outcome annotation from these documents, including the presence of cancer, progression/worsening, response/improvement, and metastases, with excellent discrimination (AUROC > 0.90). Models generalize to cancers excluded from training and yield outcomes correlated with survival. Among patients receiving checkpoint inhibitors, we confirm that high tumor mutation burden is associated with superior progression-free survival ascertained using NLP. Here, we show that deep NLP can accelerate annotation of molecular cancer datasets with clinically meaningful endpoints to facilitate discovery. Nature Publishing Group UK 2021-12-15 /pmc/articles/PMC8674229/ /pubmed/34911934 http://dx.doi.org/10.1038/s41467-021-27358-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Kehl, Kenneth L.
Xu, Wenxin
Gusev, Alexander
Bakouny, Ziad
Choueiri, Toni K.
Riaz, Irbaz Bin
Elmarakeby, Haitham
Van Allen, Eliezer M.
Schrag, Deborah
Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
title Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
title_full Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
title_fullStr Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
title_full_unstemmed Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
title_short Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
title_sort artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674229/
https://www.ncbi.nlm.nih.gov/pubmed/34911934
http://dx.doi.org/10.1038/s41467-021-27358-6
work_keys_str_mv AT kehlkennethl artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT xuwenxin artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT gusevalexander artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT bakounyziad artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT choueiritonik artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT riazirbazbin artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT elmarakebyhaitham artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT vanalleneliezerm artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset
AT schragdeborah artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset