Cargando…

Comparison of Three Information Sources for Smoking Information in Electronic Health Records

OBJECTIVE: The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Di...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Liwei, Ruan, Xiaoyang, Yang, Ping, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5147453/
https://www.ncbi.nlm.nih.gov/pubmed/27980387
http://dx.doi.org/10.4137/CIN.S40604
_version_ 1782473688885493760
author Wang, Liwei
Ruan, Xiaoyang
Yang, Ping
Liu, Hongfang
author_facet Wang, Liwei
Ruan, Xiaoyang
Yang, Ping
Liu, Hongfang
author_sort Wang, Liwei
collection PubMed
description OBJECTIVE: The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS: Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS: NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION: These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage.
format Online
Article
Text
id pubmed-5147453
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-51474532016-12-15 Comparison of Three Information Sources for Smoking Information in Electronic Health Records Wang, Liwei Ruan, Xiaoyang Yang, Ping Liu, Hongfang Cancer Inform Original Research OBJECTIVE: The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS: Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS: NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION: These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. Libertas Academica 2016-12-08 /pmc/articles/PMC5147453/ /pubmed/27980387 http://dx.doi.org/10.4137/CIN.S40604 Text en © 2016 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Original Research
Wang, Liwei
Ruan, Xiaoyang
Yang, Ping
Liu, Hongfang
Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_full Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_fullStr Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_full_unstemmed Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_short Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_sort comparison of three information sources for smoking information in electronic health records
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5147453/
https://www.ncbi.nlm.nih.gov/pubmed/27980387
http://dx.doi.org/10.4137/CIN.S40604
work_keys_str_mv AT wangliwei comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
AT ruanxiaoyang comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
AT yangping comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
AT liuhongfang comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords