Cargando…

Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision

Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence gener...

Descripción completa

Detalles Bibliográficos
Autores principales: Preston, Sam, Wei, Mu, Rao, Rajesh, Tinn, Robert, Usuyama, Naoto, Lucas, Michael, Gu, Yu, Weerasinghe, Roshanthi, Lee, Soohee, Piening, Brian, Tittel, Paul, Valluri, Naveen, Naumann, Tristan, Bifulco, Carlo, Poon, Hoifung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140604/
https://www.ncbi.nlm.nih.gov/pubmed/37123439
http://dx.doi.org/10.1016/j.patter.2023.100726
_version_ 1785033199213084672
author Preston, Sam
Wei, Mu
Rao, Rajesh
Tinn, Robert
Usuyama, Naoto
Lucas, Michael
Gu, Yu
Weerasinghe, Roshanthi
Lee, Soohee
Piening, Brian
Tittel, Paul
Valluri, Naveen
Naumann, Tristan
Bifulco, Carlo
Poon, Hoifung
author_facet Preston, Sam
Wei, Mu
Rao, Rajesh
Tinn, Robert
Usuyama, Naoto
Lucas, Michael
Gu, Yu
Weerasinghe, Roshanthi
Lee, Soohee
Piening, Brian
Tittel, Paul
Valluri, Naveen
Naumann, Tristan
Bifulco, Carlo
Poon, Hoifung
author_sort Preston, Sam
collection PubMed
description Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep-learning methods attain test area under the receiver operating characteristic curve (AUROC) values of 94%–99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Ablation results demonstrate the superiority of these advanced deep-learning methods. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels.
format Online
Article
Text
id pubmed-10140604
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-101406042023-04-29 Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision Preston, Sam Wei, Mu Rao, Rajesh Tinn, Robert Usuyama, Naoto Lucas, Michael Gu, Yu Weerasinghe, Roshanthi Lee, Soohee Piening, Brian Tittel, Paul Valluri, Naveen Naumann, Tristan Bifulco, Carlo Poon, Hoifung Patterns (N Y) Article Most detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep-learning methods attain test area under the receiver operating characteristic curve (AUROC) values of 94%–99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Ablation results demonstrate the superiority of these advanced deep-learning methods. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels. Elsevier 2023-04-14 /pmc/articles/PMC10140604/ /pubmed/37123439 http://dx.doi.org/10.1016/j.patter.2023.100726 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Preston, Sam
Wei, Mu
Rao, Rajesh
Tinn, Robert
Usuyama, Naoto
Lucas, Michael
Gu, Yu
Weerasinghe, Roshanthi
Lee, Soohee
Piening, Brian
Tittel, Paul
Valluri, Naveen
Naumann, Tristan
Bifulco, Carlo
Poon, Hoifung
Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
title Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
title_full Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
title_fullStr Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
title_full_unstemmed Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
title_short Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision
title_sort toward structuring real-world data: deep learning for extracting oncology information from clinical text with patient-level supervision
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10140604/
https://www.ncbi.nlm.nih.gov/pubmed/37123439
http://dx.doi.org/10.1016/j.patter.2023.100726
work_keys_str_mv AT prestonsam towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT weimu towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT raorajesh towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT tinnrobert towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT usuyamanaoto towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT lucasmichael towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT guyu towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT weerasingheroshanthi towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT leesoohee towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT pieningbrian towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT tittelpaul towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT vallurinaveen towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT naumanntristan towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT bifulcocarlo towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision
AT poonhoifung towardstructuringrealworlddatadeeplearningforextractingoncologyinformationfromclinicaltextwithpatientlevelsupervision