Cargando…

Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes

BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the...

Descripción completa

Detalles Bibliográficos
Autores principales: Palmer, Ellen L., Hassanpour, Saeed, Higgins, John, Doherty, Jennifer A., Onega, Tracy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657102/
https://www.ncbi.nlm.nih.gov/pubmed/31340796
http://dx.doi.org/10.1186/s12911-019-0863-3
_version_ 1783438744821432320
author Palmer, Ellen L.
Hassanpour, Saeed
Higgins, John
Doherty, Jennifer A.
Onega, Tracy
author_facet Palmer, Ellen L.
Hassanpour, Saeed
Higgins, John
Doherty, Jennifer A.
Onega, Tracy
author_sort Palmer, Ellen L.
collection PubMed
description BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the best care plan for patients at risk of smoking attributable diseases. METHODS: We developed and evaluated a health informatics pipeline for identifying complete smoking history from clinical notes in EHRs. We utilized 758 patient-visit notes (from visits between 03/28/2016 and 04/04/2016) from our local EHR in addition to a public dataset of 502 clinical notes from the 2006 i2b2 Challenge to assess the performance of this pipeline. We used a machine-learning classifier to extract smoking status and a comprehensive set of text processing regular expressions to extract pack years and cessation date information from these clinical notes. RESULTS: We identified smoking status with an F1 score of 0.90 on both the i2b2 and local data sets. Regular expression identification of pack year history in the local test set was 91.7% sensitive and 95.2% specific, but due to variable context the pack year extraction was incomplete in 25% of cases, extracting packs per day or years smoked only. Regular expression identification of cessation date was 63.2% sensitive and 94.6% specific. CONCLUSIONS: Our work indicates that the development of an EHR-based Smokers’ Registry containing information relating to smoking behaviors, not just status, from free-text clinical notes using an informatics pipeline is feasible. This pipeline is capable of functioning in external EHRs, reducing the amount of time and money needed at the institute-level to create a Smokers’ Registry for improved identification of patient risk and eligibility for preventative and early detection services. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0863-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6657102
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66571022019-07-31 Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes Palmer, Ellen L. Hassanpour, Saeed Higgins, John Doherty, Jennifer A. Onega, Tracy BMC Med Inform Decis Mak Research Article BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the best care plan for patients at risk of smoking attributable diseases. METHODS: We developed and evaluated a health informatics pipeline for identifying complete smoking history from clinical notes in EHRs. We utilized 758 patient-visit notes (from visits between 03/28/2016 and 04/04/2016) from our local EHR in addition to a public dataset of 502 clinical notes from the 2006 i2b2 Challenge to assess the performance of this pipeline. We used a machine-learning classifier to extract smoking status and a comprehensive set of text processing regular expressions to extract pack years and cessation date information from these clinical notes. RESULTS: We identified smoking status with an F1 score of 0.90 on both the i2b2 and local data sets. Regular expression identification of pack year history in the local test set was 91.7% sensitive and 95.2% specific, but due to variable context the pack year extraction was incomplete in 25% of cases, extracting packs per day or years smoked only. Regular expression identification of cessation date was 63.2% sensitive and 94.6% specific. CONCLUSIONS: Our work indicates that the development of an EHR-based Smokers’ Registry containing information relating to smoking behaviors, not just status, from free-text clinical notes using an informatics pipeline is feasible. This pipeline is capable of functioning in external EHRs, reducing the amount of time and money needed at the institute-level to create a Smokers’ Registry for improved identification of patient risk and eligibility for preventative and early detection services. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0863-3) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-25 /pmc/articles/PMC6657102/ /pubmed/31340796 http://dx.doi.org/10.1186/s12911-019-0863-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Palmer, Ellen L.
Hassanpour, Saeed
Higgins, John
Doherty, Jennifer A.
Onega, Tracy
Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
title Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
title_full Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
title_fullStr Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
title_full_unstemmed Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
title_short Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
title_sort building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657102/
https://www.ncbi.nlm.nih.gov/pubmed/31340796
http://dx.doi.org/10.1186/s12911-019-0863-3
work_keys_str_mv AT palmerellenl buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes
AT hassanpoursaeed buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes
AT higginsjohn buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes
AT dohertyjennifera buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes
AT onegatracy buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes