Cargando…
Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
BACKGROUND: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these da...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6482003/ https://www.ncbi.nlm.nih.gov/pubmed/31032481 http://dx.doi.org/10.1093/jamiaopen/ooy057 |
_version_ | 1783413818239483904 |
---|---|
author | Banerjee, Imon Li, Kevin Seneviratne, Martin Ferrari, Michelle Seto, Tina Brooks, James D Rubin, Daniel L Hernandez-Boussard, Tina |
author_facet | Banerjee, Imon Li, Kevin Seneviratne, Martin Ferrari, Michelle Seto, Tina Brooks, James D Rubin, Daniel L Hernandez-Boussard, Tina |
author_sort | Banerjee, Imon |
collection | PubMed |
description | BACKGROUND: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). METHODS: We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). RESULTS: The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. CONCLUSIONS: We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. TRIAL REGISTRATION: This is a chart review study and approved by Institutional Review Board (IRB). |
format | Online Article Text |
id | pubmed-6482003 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64820032019-04-24 Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment Banerjee, Imon Li, Kevin Seneviratne, Martin Ferrari, Michelle Seto, Tina Brooks, James D Rubin, Daniel L Hernandez-Boussard, Tina JAMIA Open Research and Applications BACKGROUND: The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD). METHODS: We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision). RESULTS: The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin. CONCLUSIONS: We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. TRIAL REGISTRATION: This is a chart review study and approved by Institutional Review Board (IRB). Oxford University Press 2019-01-04 /pmc/articles/PMC6482003/ /pubmed/31032481 http://dx.doi.org/10.1093/jamiaopen/ooy057 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Banerjee, Imon Li, Kevin Seneviratne, Martin Ferrari, Michelle Seto, Tina Brooks, James D Rubin, Daniel L Hernandez-Boussard, Tina Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
title | Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
title_full | Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
title_fullStr | Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
title_full_unstemmed | Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
title_short | Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
title_sort | weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6482003/ https://www.ncbi.nlm.nih.gov/pubmed/31032481 http://dx.doi.org/10.1093/jamiaopen/ooy057 |
work_keys_str_mv | AT banerjeeimon weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT likevin weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT seneviratnemartin weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT ferrarimichelle weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT setotina weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT brooksjamesd weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT rubindaniell weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment AT hernandezboussardtina weaklysupervisednaturallanguageprocessingforassessingpatientcenteredoutcomefollowingprostatecancertreatment |