Cargando…

Simplified data science approach to extract social and behavioural determinants: a retrospective chart review

OBJECTIVES: We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN: Retrospective chart review. SETTING: We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of ab...

Descripción completa

Detalles Bibliográficos
Autores principales: Teng, Andrew, Wilcox, Adam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8768909/
https://www.ncbi.nlm.nih.gov/pubmed/35042703
http://dx.doi.org/10.1136/bmjopen-2020-048397
_version_ 1784635021056802816
author Teng, Andrew
Wilcox, Adam
author_facet Teng, Andrew
Wilcox, Adam
author_sort Teng, Andrew
collection PubMed
description OBJECTIVES: We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN: Retrospective chart review. SETTING: We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. PARTICIPANTS: Notes were retrospectively extracted from 43 798 acute care patients. METHODS: We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. RESULTS: From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes.
format Online
Article
Text
id pubmed-8768909
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-87689092022-02-04 Simplified data science approach to extract social and behavioural determinants: a retrospective chart review Teng, Andrew Wilcox, Adam BMJ Open Health Informatics OBJECTIVES: We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN: Retrospective chart review. SETTING: We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. PARTICIPANTS: Notes were retrospectively extracted from 43 798 acute care patients. METHODS: We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. RESULTS: From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes. BMJ Publishing Group 2022-01-17 /pmc/articles/PMC8768909/ /pubmed/35042703 http://dx.doi.org/10.1136/bmjopen-2020-048397 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Health Informatics
Teng, Andrew
Wilcox, Adam
Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
title Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
title_full Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
title_fullStr Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
title_full_unstemmed Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
title_short Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
title_sort simplified data science approach to extract social and behavioural determinants: a retrospective chart review
topic Health Informatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8768909/
https://www.ncbi.nlm.nih.gov/pubmed/35042703
http://dx.doi.org/10.1136/bmjopen-2020-048397
work_keys_str_mv AT tengandrew simplifieddatascienceapproachtoextractsocialandbehaviouraldeterminantsaretrospectivechartreview
AT wilcoxadam simplifieddatascienceapproachtoextractsocialandbehaviouraldeterminantsaretrospectivechartreview