Cargando…
Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research
Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstruct...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9181010/ https://www.ncbi.nlm.nih.gov/pubmed/35683353 http://dx.doi.org/10.3390/jcm11112967 |
_version_ | 1784723661866926080 |
---|---|
author | Song, Gyuseon Chung, Su Jin Seo, Ji Yeon Yang, Sun Young Jin, Eun Hyo Chung, Goh Eun Shim, Sung Ryul Sa, Soonok Hong, Moongi Simon Kim, Kang Hyun Jang, Eunchan Lee, Chae Won Bae, Jung Ho Han, Hyun Wook |
author_facet | Song, Gyuseon Chung, Su Jin Seo, Ji Yeon Yang, Sun Young Jin, Eun Hyo Chung, Goh Eun Shim, Sung Ryul Sa, Soonok Hong, Moongi Simon Kim, Kang Hyun Jang, Eunchan Lee, Chae Won Bae, Jung Ho Han, Hyun Wook |
author_sort | Song, Gyuseon |
collection | PubMed |
description | Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010–2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases. |
format | Online Article Text |
id | pubmed-9181010 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-91810102022-06-10 Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research Song, Gyuseon Chung, Su Jin Seo, Ji Yeon Yang, Sun Young Jin, Eun Hyo Chung, Goh Eun Shim, Sung Ryul Sa, Soonok Hong, Moongi Simon Kim, Kang Hyun Jang, Eunchan Lee, Chae Won Bae, Jung Ho Han, Hyun Wook J Clin Med Article Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010–2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases. MDPI 2022-05-24 /pmc/articles/PMC9181010/ /pubmed/35683353 http://dx.doi.org/10.3390/jcm11112967 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Song, Gyuseon Chung, Su Jin Seo, Ji Yeon Yang, Sun Young Jin, Eun Hyo Chung, Goh Eun Shim, Sung Ryul Sa, Soonok Hong, Moongi Simon Kim, Kang Hyun Jang, Eunchan Lee, Chae Won Bae, Jung Ho Han, Hyun Wook Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research |
title | Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research |
title_full | Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research |
title_fullStr | Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research |
title_full_unstemmed | Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research |
title_short | Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research |
title_sort | natural language processing for information extraction of gastric diseases and its application in large-scale clinical research |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9181010/ https://www.ncbi.nlm.nih.gov/pubmed/35683353 http://dx.doi.org/10.3390/jcm11112967 |
work_keys_str_mv | AT songgyuseon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT chungsujin naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT seojiyeon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT yangsunyoung naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT jineunhyo naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT chunggoheun naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT shimsungryul naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT sasoonok naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT hongmoongisimon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT kimkanghyun naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT jangeunchan naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT leechaewon naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT baejungho naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch AT hanhyunwook naturallanguageprocessingforinformationextractionofgastricdiseasesanditsapplicationinlargescaleclinicalresearch |