Cargando…

Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis

BACKGROUND: Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increas...

Descripción completa

Detalles Bibliográficos
Autores principales: Bittar, André, Velupillai, Sumithra, Roberts, Angus, Dutta, Rina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080148/
https://www.ncbi.nlm.nih.gov/pubmed/33847595
http://dx.doi.org/10.2196/22397
_version_ 1783685369409044480
author Bittar, André
Velupillai, Sumithra
Roberts, Angus
Dutta, Rina
author_facet Bittar, André
Velupillai, Sumithra
Roberts, Angus
Dutta, Rina
author_sort Bittar, André
collection PubMed
description BACKGROUND: Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians’ subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora. OBJECTIVE: This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment. METHODS: The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score. RESULTS: The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful. CONCLUSIONS: Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non–suicide-related EHR texts.
format Online
Article
Text
id pubmed-8080148
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-80801482021-05-06 Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis Bittar, André Velupillai, Sumithra Roberts, Angus Dutta, Rina JMIR Med Inform Original Paper BACKGROUND: Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians’ subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora. OBJECTIVE: This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment. METHODS: The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score. RESULTS: The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful. CONCLUSIONS: Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non–suicide-related EHR texts. JMIR Publications 2021-04-13 /pmc/articles/PMC8080148/ /pubmed/33847595 http://dx.doi.org/10.2196/22397 Text en ©André Bittar, Sumithra Velupillai, Angus Roberts, Rina Dutta. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 13.04.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Bittar, André
Velupillai, Sumithra
Roberts, Angus
Dutta, Rina
Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis
title Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis
title_full Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis
title_fullStr Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis
title_full_unstemmed Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis
title_short Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis
title_sort using general-purpose sentiment lexicons for suicide risk assessment in electronic health records: corpus-based analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080148/
https://www.ncbi.nlm.nih.gov/pubmed/33847595
http://dx.doi.org/10.2196/22397
work_keys_str_mv AT bittarandre usinggeneralpurposesentimentlexiconsforsuicideriskassessmentinelectronichealthrecordscorpusbasedanalysis
AT velupillaisumithra usinggeneralpurposesentimentlexiconsforsuicideriskassessmentinelectronichealthrecordscorpusbasedanalysis
AT robertsangus usinggeneralpurposesentimentlexiconsforsuicideriskassessmentinelectronichealthrecordscorpusbasedanalysis
AT duttarina usinggeneralpurposesentimentlexiconsforsuicideriskassessmentinelectronichealthrecordscorpusbasedanalysis