Cargando…
Simplified data science approach to extract social and behavioural determinants: a retrospective chart review
OBJECTIVES: We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN: Retrospective chart review. SETTING: We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of ab...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Publishing Group
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8768909/ https://www.ncbi.nlm.nih.gov/pubmed/35042703 http://dx.doi.org/10.1136/bmjopen-2020-048397 |
_version_ | 1784635021056802816 |
---|---|
author | Teng, Andrew Wilcox, Adam |
author_facet | Teng, Andrew Wilcox, Adam |
author_sort | Teng, Andrew |
collection | PubMed |
description | OBJECTIVES: We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN: Retrospective chart review. SETTING: We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. PARTICIPANTS: Notes were retrospectively extracted from 43 798 acute care patients. METHODS: We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. RESULTS: From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes. |
format | Online Article Text |
id | pubmed-8768909 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BMJ Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-87689092022-02-04 Simplified data science approach to extract social and behavioural determinants: a retrospective chart review Teng, Andrew Wilcox, Adam BMJ Open Health Informatics OBJECTIVES: We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN: Retrospective chart review. SETTING: We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. PARTICIPANTS: Notes were retrospectively extracted from 43 798 acute care patients. METHODS: We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. RESULTS: From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes. BMJ Publishing Group 2022-01-17 /pmc/articles/PMC8768909/ /pubmed/35042703 http://dx.doi.org/10.1136/bmjopen-2020-048397 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) . |
spellingShingle | Health Informatics Teng, Andrew Wilcox, Adam Simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
title | Simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
title_full | Simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
title_fullStr | Simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
title_full_unstemmed | Simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
title_short | Simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
title_sort | simplified data science approach to extract social and behavioural determinants: a retrospective chart review |
topic | Health Informatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8768909/ https://www.ncbi.nlm.nih.gov/pubmed/35042703 http://dx.doi.org/10.1136/bmjopen-2020-048397 |
work_keys_str_mv | AT tengandrew simplifieddatascienceapproachtoextractsocialandbehaviouraldeterminantsaretrospectivechartreview AT wilcoxadam simplifieddatascienceapproachtoextractsocialandbehaviouraldeterminantsaretrospectivechartreview |