Cargando…

Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research

Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstruct...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Gyuseon, Chung, Su Jin, Seo, Ji Yeon, Yang, Sun Young, Jin, Eun Hyo, Chung, Goh Eun, Shim, Sung Ryul, Sa, Soonok, Hong, Moongi Simon, Kim, Kang Hyun, Jang, Eunchan, Lee, Chae Won, Bae, Jung Ho, Han, Hyun Wook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9181010/
https://www.ncbi.nlm.nih.gov/pubmed/35683353
http://dx.doi.org/10.3390/jcm11112967
_version_ 1784723661866926080
author Song, Gyuseon
Chung, Su Jin
Seo, Ji Yeon
Yang, Sun Young
Jin, Eun Hyo
Chung, Goh Eun
Shim, Sung Ryul
Sa, Soonok
Hong, Moongi Simon
Kim, Kang Hyun
Jang, Eunchan
Lee, Chae Won
Bae, Jung Ho
Han, Hyun Wook
author_facet Song, Gyuseon
Chung, Su Jin
Seo, Ji Yeon
Yang, Sun Young
Jin, Eun Hyo
Chung, Goh Eun
Shim, Sung Ryul
Sa, Soonok
Hong, Moongi Simon
Kim, Kang Hyun
Jang, Eunchan
Lee, Chae Won
Bae, Jung Ho
Han, Hyun Wook
author_sort Song, Gyuseon
collection PubMed
description Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010–2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases.
format Online
Article
Text
id pubmed-9181010
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91810102022-06-10 Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research Song, Gyuseon Chung, Su Jin Seo, Ji Yeon Yang, Sun Young Jin, Eun Hyo Chung, Goh Eun Shim, Sung Ryul Sa, Soonok Hong, Moongi Simon Kim, Kang Hyun Jang, Eunchan Lee, Chae Won Bae, Jung Ho Han, Hyun Wook J Clin Med Article Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010–2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases. MDPI 2022-05-24 /pmc/articles/PMC9181010/ /pubmed/35683353 http://dx.doi.org/10.3390/jcm11112967 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Gyuseon
Chung, Su Jin
Seo, Ji Yeon
Yang, Sun Young
Jin, Eun Hyo
Chung, Goh Eun
Shim, Sung Ryul
Sa, Soonok
Hong, Moongi Simon
Kim, Kang Hyun
Jang, Eunchan
Lee, Chae Won
Bae, Jung Ho
Han, Hyun Wook
Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
title Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
title_full Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
title_fullStr Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
title_full_unstemmed Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
title_short Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
title_sort natural language processing for information extraction of gastric diseases and its application in large-scale clinical research
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9181010/
https://www.ncbi.nlm.nih.gov/pubmed/35683353
http://dx.doi.org/10.3390/jcm11112967
work_keys_str_mv AT songgyuseon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT chungsujin naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT seojiyeon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT yangsunyoung naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT jineunhyo naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT chunggoheun naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT shimsungryul naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT sasoonok naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT hongmoongisimon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT kimkanghyun naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT jangeunchan naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT leechaewon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT baejungho naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch
AT hanhyunwook naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch