Cargando…
The utility of including pathology reports in improving the computational identification of patients
BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In th...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Medknow Publications & Media Pvt Ltd
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139449/ https://www.ncbi.nlm.nih.gov/pubmed/27994938 http://dx.doi.org/10.4103/2153-3539.194838 |
_version_ | 1782472251711422464 |
---|---|
author | Chen, Wei Huang, Yungui Boyle, Brendan Lin, Simon |
author_facet | Chen, Wei Huang, Yungui Boyle, Brendan Lin, Simon |
author_sort | Chen, Wei |
collection | PubMed |
description | BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). MATERIALS AND METHODS: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. RESULTS: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. CONCLUSION: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification. |
format | Online Article Text |
id | pubmed-5139449 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Medknow Publications & Media Pvt Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-51394492016-12-19 The utility of including pathology reports in improving the computational identification of patients Chen, Wei Huang, Yungui Boyle, Brendan Lin, Simon J Pathol Inform Original Article BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). MATERIALS AND METHODS: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. RESULTS: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. CONCLUSION: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification. Medknow Publications & Media Pvt Ltd 2016-11-29 /pmc/articles/PMC5139449/ /pubmed/27994938 http://dx.doi.org/10.4103/2153-3539.194838 Text en Copyright: © 2016 Journal of Pathology Informatics http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms. |
spellingShingle | Original Article Chen, Wei Huang, Yungui Boyle, Brendan Lin, Simon The utility of including pathology reports in improving the computational identification of patients |
title | The utility of including pathology reports in improving the computational identification of patients |
title_full | The utility of including pathology reports in improving the computational identification of patients |
title_fullStr | The utility of including pathology reports in improving the computational identification of patients |
title_full_unstemmed | The utility of including pathology reports in improving the computational identification of patients |
title_short | The utility of including pathology reports in improving the computational identification of patients |
title_sort | utility of including pathology reports in improving the computational identification of patients |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139449/ https://www.ncbi.nlm.nih.gov/pubmed/27994938 http://dx.doi.org/10.4103/2153-3539.194838 |
work_keys_str_mv | AT chenwei theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT huangyungui theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT boylebrendan theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT linsimon theutilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT chenwei utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT huangyungui utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT boylebrendan utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients AT linsimon utilityofincludingpathologyreportsinimprovingthecomputationalidentificationofpatients |