Cargando…

The utility of including pathology reports in improving the computational identification of patients

BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In th...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Wei, Huang, Yungui, Boyle, Brendan, Lin, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139449/
https://www.ncbi.nlm.nih.gov/pubmed/27994938
http://dx.doi.org/10.4103/2153-3539.194838
_version_ 1782472251711422464
author Chen, Wei
Huang, Yungui
Boyle, Brendan
Lin, Simon
author_facet Chen, Wei
Huang, Yungui
Boyle, Brendan
Lin, Simon
author_sort Chen, Wei
collection PubMed
description BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). MATERIALS AND METHODS: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. RESULTS: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. CONCLUSION: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification.
format Online
Article
Text
id pubmed-5139449
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-51394492016-12-19 The utility of including pathology reports in improving the computational identification of patients Chen, Wei Huang, Yungui Boyle, Brendan Lin, Simon J Pathol Inform Original Article BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). MATERIALS AND METHODS: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. RESULTS: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. CONCLUSION: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification. Medknow Publications & Media Pvt Ltd 2016-11-29 /pmc/articles/PMC5139449/ /pubmed/27994938 http://dx.doi.org/10.4103/2153-3539.194838 Text en Copyright: © 2016 Journal of Pathology Informatics http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.
spellingShingle Original Article
Chen, Wei
Huang, Yungui
Boyle, Brendan
Lin, Simon
The utility of including pathology reports in improving the computational identification of patients
title The utility of including pathology reports in improving the computational identification of patients
title_full The utility of including pathology reports in improving the computational identification of patients
title_fullStr The utility of including pathology reports in improving the computational identification of patients
title_full_unstemmed The utility of including pathology reports in improving the computational identification of patients
title_short The utility of including pathology reports in improving the computational identification of patients
title_sort utility of including pathology reports in improving the computational identification of patients
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139449/
https://www.ncbi.nlm.nih.gov/pubmed/27994938
http://dx.doi.org/10.4103/2153-3539.194838
work_keys_str_mv AT chenwei theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT huangyungui theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT boylebrendan theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT linsimon theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT chenwei utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT huangyungui utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT boylebrendan utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients
AT linsimon utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients