Cargando…
Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset
To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precis...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674229/ https://www.ncbi.nlm.nih.gov/pubmed/34911934 http://dx.doi.org/10.1038/s41467-021-27358-6 |
_version_ | 1784615603791724544 |
---|---|
author | Kehl, Kenneth L. Xu, Wenxin Gusev, Alexander Bakouny, Ziad Choueiri, Toni K. Riaz, Irbaz Bin Elmarakeby, Haitham Van Allen, Eliezer M. Schrag, Deborah |
author_facet | Kehl, Kenneth L. Xu, Wenxin Gusev, Alexander Bakouny, Ziad Choueiri, Toni K. Riaz, Irbaz Bin Elmarakeby, Haitham Van Allen, Eliezer M. Schrag, Deborah |
author_sort | Kehl, Kenneth L. |
collection | PubMed |
description | To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precision oncology study. Outcomes are extracted from 305,151 imaging reports for 13,130 patients and 233,517 oncologist notes for 13,511 patients, including patients with 6 additional cancer types. NLP models recapitulate outcome annotation from these documents, including the presence of cancer, progression/worsening, response/improvement, and metastases, with excellent discrimination (AUROC > 0.90). Models generalize to cancers excluded from training and yield outcomes correlated with survival. Among patients receiving checkpoint inhibitors, we confirm that high tumor mutation burden is associated with superior progression-free survival ascertained using NLP. Here, we show that deep NLP can accelerate annotation of molecular cancer datasets with clinically meaningful endpoints to facilitate discovery. |
format | Online Article Text |
id | pubmed-8674229 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-86742292022-01-04 Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset Kehl, Kenneth L. Xu, Wenxin Gusev, Alexander Bakouny, Ziad Choueiri, Toni K. Riaz, Irbaz Bin Elmarakeby, Haitham Van Allen, Eliezer M. Schrag, Deborah Nat Commun Article To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precision oncology study. Outcomes are extracted from 305,151 imaging reports for 13,130 patients and 233,517 oncologist notes for 13,511 patients, including patients with 6 additional cancer types. NLP models recapitulate outcome annotation from these documents, including the presence of cancer, progression/worsening, response/improvement, and metastases, with excellent discrimination (AUROC > 0.90). Models generalize to cancers excluded from training and yield outcomes correlated with survival. Among patients receiving checkpoint inhibitors, we confirm that high tumor mutation burden is associated with superior progression-free survival ascertained using NLP. Here, we show that deep NLP can accelerate annotation of molecular cancer datasets with clinically meaningful endpoints to facilitate discovery. Nature Publishing Group UK 2021-12-15 /pmc/articles/PMC8674229/ /pubmed/34911934 http://dx.doi.org/10.1038/s41467-021-27358-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Kehl, Kenneth L. Xu, Wenxin Gusev, Alexander Bakouny, Ziad Choueiri, Toni K. Riaz, Irbaz Bin Elmarakeby, Haitham Van Allen, Eliezer M. Schrag, Deborah Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
title | Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
title_full | Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
title_fullStr | Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
title_full_unstemmed | Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
title_short | Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
title_sort | artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8674229/ https://www.ncbi.nlm.nih.gov/pubmed/34911934 http://dx.doi.org/10.1038/s41467-021-27358-6 |
work_keys_str_mv | AT kehlkennethl artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT xuwenxin artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT gusevalexander artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT bakounyziad artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT choueiritonik artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT riazirbazbin artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT elmarakebyhaitham artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT vanalleneliezerm artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset AT schragdeborah artificialintelligenceaidedclinicalannotationofalargemulticancergenomicdataset |