Cargando…
Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence gener...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140604/ https://www.ncbi.nlm.nih.gov/pubmed/37123439 http://dx.doi.org/10.1016/j.patter.2023.100726 |
_version_ | 1785033199213084672 |
---|---|
author | Preston, Sam Wei, Mu Rao, Rajesh Tinn, Robert Usuyama, Naoto Lucas, Michael Gu, Yu Weerasinghe, Roshanthi Lee, Soohee Piening, Brian Tittel, Paul Valluri, Naveen Naumann, Tristan Bifulco, Carlo Poon, Hoifung |
author_facet | Preston, Sam Wei, Mu Rao, Rajesh Tinn, Robert Usuyama, Naoto Lucas, Michael Gu, Yu Weerasinghe, Roshanthi Lee, Soohee Piening, Brian Tittel, Paul Valluri, Naveen Naumann, Tristan Bifulco, Carlo Poon, Hoifung |
author_sort | Preston, Sam |
collection | PubMed |
description | Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep-learning methods attain test area under the receiver operating characteristic curve (AUROC) values of 94%–99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Ablation results demonstrate the superiority of these advanced deep-learning methods. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels. |
format | Online Article Text |
id | pubmed-10140604 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-101406042023-04-29 Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision Preston, Sam Wei, Mu Rao, Rajesh Tinn, Robert Usuyama, Naoto Lucas, Michael Gu, Yu Weerasinghe, Roshanthi Lee, Soohee Piening, Brian Tittel, Paul Valluri, Naveen Naumann, Tristan Bifulco, Carlo Poon, Hoifung Patterns (N Y) Article Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep-learning methods attain test area under the receiver operating characteristic curve (AUROC) values of 94%–99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Ablation results demonstrate the superiority of these advanced deep-learning methods. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels. Elsevier 2023-04-14 /pmc/articles/PMC10140604/ /pubmed/37123439 http://dx.doi.org/10.1016/j.patter.2023.100726 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Preston, Sam Wei, Mu Rao, Rajesh Tinn, Robert Usuyama, Naoto Lucas, Michael Gu, Yu Weerasinghe, Roshanthi Lee, Soohee Piening, Brian Tittel, Paul Valluri, Naveen Naumann, Tristan Bifulco, Carlo Poon, Hoifung Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision |
title | Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision |
title_full | Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision |
title_fullStr | Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision |
title_full_unstemmed | Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision |
title_short | Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision |
title_sort | toward structuring real-world data: deep learning for extracting oncology information from clinical text with patient-level supervision |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140604/ https://www.ncbi.nlm.nih.gov/pubmed/37123439 http://dx.doi.org/10.1016/j.patter.2023.100726 |
work_keys_str_mv | AT prestonsam towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT weimu towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT raorajesh towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT tinnrobert towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT usuyamanaoto towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT lucasmichael towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT guyu towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT weerasingheroshanthi towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT leesoohee towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT pieningbrian towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT tittelpaul towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT vallurinaveen towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT naumanntristan towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT bifulcocarlo towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision AT poonhoifung towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision |