Cargando…

The utility of including pathology reports in improving the computational identification of patients

BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In th...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Wei, Huang, Yungui, Boyle, Brendan, Lin, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5139449/
https://www.ncbi.nlm.nih.gov/pubmed/27994938
http://dx.doi.org/10.4103/2153-3539.194838
Descripción
Sumario:BACKGROUND: Celiac disease (CD) is a common autoimmune disorder. Efficient identification of patients may improve chronic management of the disease. Prior studies have shown searching International Classification of Diseases-9 (ICD-9) codes alone is inaccurate for identifying patients with CD. In this study, we developed automated classification algorithms leveraging pathology reports and other clinical data in Electronic Health Records (EHRs) to refine the subset population preselected using ICD-9 code (579.0). MATERIALS AND METHODS: EHRs were searched for established ICD-9 code (579.0) suggesting CD, based on which an initial identification of cases was obtained. In addition, laboratory results for tissue transglutaminse were extracted. Using natural language processing we analyzed pathology reports from upper endoscopy. Twelve machine learning classifiers using different combinations of variables related to ICD-9 CD status, laboratory result status, and pathology reports were experimented to find the best possible CD classifier. Ten-fold cross-validation was used to assess the results. RESULTS: A total of 1498 patient records were used including 363 confirmed cases and 1135 false positive cases that served as controls. Logistic model based on both clinical and pathology report features produced the best results: Kappa of 0.78, F1 of 0.92, and area under the curve (AUC) of 0.94, whereas in contrast using ICD-9 only generated poor results: Kappa of 0.28, F1 of 0.75, and AUC of 0.63. CONCLUSION: Our automated classification system presented an efficient and reliable way to improve the performance of CD patient identification.