Cargando…

Large-scale identification of undiagnosed hepatic steatosis using natural language processing

BACKGROUND: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic he...

Descripción completa

Detalles Bibliográficos
Autores principales: Schneider, Carolin V., Li, Tang, Zhang, David, Mezina, Anya I., Rattan, Puru, Huang, Helen, Creasy, Kate Townsend, Scorletti, Eleonora, Zandvakili, Inuk, Vujkovic, Marijana, Hehl, Leonida, Fiksel, Jacob, Park, Joseph, Wangensteen, Kirk, Risman, Marjorie, Chang, Kyong-Mi, Serper, Marina, Carr, Rotonya M., Schneider, Kai Markus, Chen, Jinbo, Rader, Daniel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432816/
https://www.ncbi.nlm.nih.gov/pubmed/37599905
http://dx.doi.org/10.1016/j.eclinm.2023.102149
_version_ 1785091508088602624
author Schneider, Carolin V.
Li, Tang
Zhang, David
Mezina, Anya I.
Rattan, Puru
Huang, Helen
Creasy, Kate Townsend
Scorletti, Eleonora
Zandvakili, Inuk
Vujkovic, Marijana
Hehl, Leonida
Fiksel, Jacob
Park, Joseph
Wangensteen, Kirk
Risman, Marjorie
Chang, Kyong-Mi
Serper, Marina
Carr, Rotonya M.
Schneider, Kai Markus
Chen, Jinbo
Rader, Daniel J.
author_facet Schneider, Carolin V.
Li, Tang
Zhang, David
Mezina, Anya I.
Rattan, Puru
Huang, Helen
Creasy, Kate Townsend
Scorletti, Eleonora
Zandvakili, Inuk
Vujkovic, Marijana
Hehl, Leonida
Fiksel, Jacob
Park, Joseph
Wangensteen, Kirk
Risman, Marjorie
Chang, Kyong-Mi
Serper, Marina
Carr, Rotonya M.
Schneider, Kai Markus
Chen, Jinbo
Rader, Daniel J.
author_sort Schneider, Carolin V.
collection PubMed
description BACKGROUND: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports. METHODS: A rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%). FINDINGS: After exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis. INTERPRETATION: This study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes. FUNDING: The study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process.
format Online
Article
Text
id pubmed-10432816
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-104328162023-08-18 Large-scale identification of undiagnosed hepatic steatosis using natural language processing Schneider, Carolin V. Li, Tang Zhang, David Mezina, Anya I. Rattan, Puru Huang, Helen Creasy, Kate Townsend Scorletti, Eleonora Zandvakili, Inuk Vujkovic, Marijana Hehl, Leonida Fiksel, Jacob Park, Joseph Wangensteen, Kirk Risman, Marjorie Chang, Kyong-Mi Serper, Marina Carr, Rotonya M. Schneider, Kai Markus Chen, Jinbo Rader, Daniel J. eClinicalMedicine Articles BACKGROUND: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports. METHODS: A rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%). FINDINGS: After exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis. INTERPRETATION: This study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes. FUNDING: The study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process. Elsevier 2023-08-09 /pmc/articles/PMC10432816/ /pubmed/37599905 http://dx.doi.org/10.1016/j.eclinm.2023.102149 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Articles
Schneider, Carolin V.
Li, Tang
Zhang, David
Mezina, Anya I.
Rattan, Puru
Huang, Helen
Creasy, Kate Townsend
Scorletti, Eleonora
Zandvakili, Inuk
Vujkovic, Marijana
Hehl, Leonida
Fiksel, Jacob
Park, Joseph
Wangensteen, Kirk
Risman, Marjorie
Chang, Kyong-Mi
Serper, Marina
Carr, Rotonya M.
Schneider, Kai Markus
Chen, Jinbo
Rader, Daniel J.
Large-scale identification of undiagnosed hepatic steatosis using natural language processing
title Large-scale identification of undiagnosed hepatic steatosis using natural language processing
title_full Large-scale identification of undiagnosed hepatic steatosis using natural language processing
title_fullStr Large-scale identification of undiagnosed hepatic steatosis using natural language processing
title_full_unstemmed Large-scale identification of undiagnosed hepatic steatosis using natural language processing
title_short Large-scale identification of undiagnosed hepatic steatosis using natural language processing
title_sort large-scale identification of undiagnosed hepatic steatosis using natural language processing
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10432816/
https://www.ncbi.nlm.nih.gov/pubmed/37599905
http://dx.doi.org/10.1016/j.eclinm.2023.102149
work_keys_str_mv AT schneidercarolinv largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT litang largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT zhangdavid largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT mezinaanyai largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT rattanpuru largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT huanghelen largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT creasykatetownsend largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT scorlettieleonora largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT zandvakiliinuk largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT vujkovicmarijana largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT hehlleonida largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT fikseljacob largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT parkjoseph largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT wangensteenkirk largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT rismanmarjorie largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT changkyongmi largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT serpermarina largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT carrrotonyam largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT schneiderkaimarkus largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT chenjinbo largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing
AT raderdanielj largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessing