Cargando…
Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes
BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657102/ https://www.ncbi.nlm.nih.gov/pubmed/31340796 http://dx.doi.org/10.1186/s12911-019-0863-3 |
_version_ | 1783438744821432320 |
---|---|
author | Palmer, Ellen L. Hassanpour, Saeed Higgins, John Doherty, Jennifer A. Onega, Tracy |
author_facet | Palmer, Ellen L. Hassanpour, Saeed Higgins, John Doherty, Jennifer A. Onega, Tracy |
author_sort | Palmer, Ellen L. |
collection | PubMed |
description | BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the best care plan for patients at risk of smoking attributable diseases. METHODS: We developed and evaluated a health informatics pipeline for identifying complete smoking history from clinical notes in EHRs. We utilized 758 patient-visit notes (from visits between 03/28/2016 and 04/04/2016) from our local EHR in addition to a public dataset of 502 clinical notes from the 2006 i2b2 Challenge to assess the performance of this pipeline. We used a machine-learning classifier to extract smoking status and a comprehensive set of text processing regular expressions to extract pack years and cessation date information from these clinical notes. RESULTS: We identified smoking status with an F1 score of 0.90 on both the i2b2 and local data sets. Regular expression identification of pack year history in the local test set was 91.7% sensitive and 95.2% specific, but due to variable context the pack year extraction was incomplete in 25% of cases, extracting packs per day or years smoked only. Regular expression identification of cessation date was 63.2% sensitive and 94.6% specific. CONCLUSIONS: Our work indicates that the development of an EHR-based Smokers’ Registry containing information relating to smoking behaviors, not just status, from free-text clinical notes using an informatics pipeline is feasible. This pipeline is capable of functioning in external EHRs, reducing the amount of time and money needed at the institute-level to create a Smokers’ Registry for improved identification of patient risk and eligibility for preventative and early detection services. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0863-3) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6657102 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-66571022019-07-31 Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes Palmer, Ellen L. Hassanpour, Saeed Higgins, John Doherty, Jennifer A. Onega, Tracy BMC Med Inform Decis Mak Research Article BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the best care plan for patients at risk of smoking attributable diseases. METHODS: We developed and evaluated a health informatics pipeline for identifying complete smoking history from clinical notes in EHRs. We utilized 758 patient-visit notes (from visits between 03/28/2016 and 04/04/2016) from our local EHR in addition to a public dataset of 502 clinical notes from the 2006 i2b2 Challenge to assess the performance of this pipeline. We used a machine-learning classifier to extract smoking status and a comprehensive set of text processing regular expressions to extract pack years and cessation date information from these clinical notes. RESULTS: We identified smoking status with an F1 score of 0.90 on both the i2b2 and local data sets. Regular expression identification of pack year history in the local test set was 91.7% sensitive and 95.2% specific, but due to variable context the pack year extraction was incomplete in 25% of cases, extracting packs per day or years smoked only. Regular expression identification of cessation date was 63.2% sensitive and 94.6% specific. CONCLUSIONS: Our work indicates that the development of an EHR-based Smokers’ Registry containing information relating to smoking behaviors, not just status, from free-text clinical notes using an informatics pipeline is feasible. This pipeline is capable of functioning in external EHRs, reducing the amount of time and money needed at the institute-level to create a Smokers’ Registry for improved identification of patient risk and eligibility for preventative and early detection services. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0863-3) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-25 /pmc/articles/PMC6657102/ /pubmed/31340796 http://dx.doi.org/10.1186/s12911-019-0863-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Palmer, Ellen L. Hassanpour, Saeed Higgins, John Doherty, Jennifer A. Onega, Tracy Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
title | Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
title_full | Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
title_fullStr | Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
title_full_unstemmed | Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
title_short | Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
title_sort | building a tobacco user registry by extracting multiple smoking behaviors from clinical notes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6657102/ https://www.ncbi.nlm.nih.gov/pubmed/31340796 http://dx.doi.org/10.1186/s12911-019-0863-3 |
work_keys_str_mv | AT palmerellenl buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes AT hassanpoursaeed buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes AT higginsjohn buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes AT dohertyjennifera buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes AT onegatracy buildingatobaccouserregistrybyextractingmultiplesmokingbehaviorsfromclinicalnotes |