Cargando…

Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment

BACKGROUND: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these da...

Descripción completa

Detalles Bibliográficos
Autores principales: Banerjee, Imon, Li, Kevin, Seneviratne, Martin, Ferrari, Michelle, Seto, Tina, Brooks, James D, Rubin, Daniel L, Hernandez-Boussard, Tina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6482003/
https://www.ncbi.nlm.nih.gov/pubmed/31032481
http://dx.doi.org/10.1093/jamiaopen/ooy057
_version_ 1783413818239483904
author Banerjee, Imon
Li, Kevin
Seneviratne, Martin
Ferrari, Michelle
Seto, Tina
Brooks, James D
Rubin, Daniel L
Hernandez-Boussard, Tina
author_facet Banerjee, Imon
Li, Kevin
Seneviratne, Martin
Ferrari, Michelle
Seto, Tina
Brooks, James D
Rubin, Daniel L
Hernandez-Boussard, Tina
author_sort Banerjee, Imon
collection PubMed
description BACKGROUND: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). METHODS: We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). RESULTS: The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. CONCLUSIONS: We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. TRIAL REGISTRATION: This is a chart review study and approved by Institutional Review Board (IRB).
format Online
Article
Text
id pubmed-6482003
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64820032019-04-24 Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment Banerjee, Imon Li, Kevin Seneviratne, Martin Ferrari, Michelle Seto, Tina Brooks, James D Rubin, Daniel L Hernandez-Boussard, Tina JAMIA Open Research and Applications BACKGROUND: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). METHODS: We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). RESULTS: The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. CONCLUSIONS: We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. TRIAL REGISTRATION: This is a chart review study and approved by Institutional Review Board (IRB). Oxford University Press 2019-01-04 /pmc/articles/PMC6482003/ /pubmed/31032481 http://dx.doi.org/10.1093/jamiaopen/ooy057 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Banerjee, Imon
Li, Kevin
Seneviratne, Martin
Ferrari, Michelle
Seto, Tina
Brooks, James D
Rubin, Daniel L
Hernandez-Boussard, Tina
Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
title Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
title_full Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
title_fullStr Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
title_full_unstemmed Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
title_short Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
title_sort weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6482003/
https://www.ncbi.nlm.nih.gov/pubmed/31032481
http://dx.doi.org/10.1093/jamiaopen/ooy057
work_keys_str_mv AT banerjeeimon weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT likevin weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT seneviratnemartin weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT ferrarimichelle weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT setotina weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT brooksjamesd weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT rubindaniell weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment
AT hernandezboussardtina weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment