Cargando…

Deep learning approach to detection of colonoscopic information from unstructured reports

BACKGROUND: Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Seong, Donghyeong, Choi, Yoon Ho, Shin, Soo-Yong, Yi, Byoung-Kee
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9903463/ https://www.ncbi.nlm.nih.gov/pubmed/36750932 http://dx.doi.org/10.1186/s12911-023-02121-7

_version_	1784883477577990144
author	Seong, Donghyeong Choi, Yoon Ho Shin, Soo-Yong Yi, Byoung-Kee
author_facet	Seong, Donghyeong Choi, Yoon Ho Shin, Soo-Yong Yi, Byoung-Kee
author_sort	Seong, Donghyeong
collection	PubMed
description	BACKGROUND: Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS: This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS: The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS: This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
format	Online Article Text
id	pubmed-9903463
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-99034632023-02-08 Deep learning approach to detection of colonoscopic information from unstructured reports Seong, Donghyeong Choi, Yoon Ho Shin, Soo-Yong Yi, Byoung-Kee BMC Med Inform Decis Mak Research BACKGROUND: Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS: This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS: The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS: This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes. BioMed Central 2023-02-07 /pmc/articles/PMC9903463/ /pubmed/36750932 http://dx.doi.org/10.1186/s12911-023-02121-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Seong, Donghyeong Choi, Yoon Ho Shin, Soo-Yong Yi, Byoung-Kee Deep learning approach to detection of colonoscopic information from unstructured reports
title	Deep learning approach to detection of colonoscopic information from unstructured reports
title_full	Deep learning approach to detection of colonoscopic information from unstructured reports
title_fullStr	Deep learning approach to detection of colonoscopic information from unstructured reports
title_full_unstemmed	Deep learning approach to detection of colonoscopic information from unstructured reports
title_short	Deep learning approach to detection of colonoscopic information from unstructured reports
title_sort	deep learning approach to detection of colonoscopic information from unstructured reports
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9903463/ https://www.ncbi.nlm.nih.gov/pubmed/36750932 http://dx.doi.org/10.1186/s12911-023-02121-7
work_keys_str_mv	AT seongdonghyeong deeplearningapproachtodetectionofcolonoscopicinformationfromunstructuredreports AT choiyoonho deeplearningapproachtodetectionofcolonoscopicinformationfromunstructuredreports AT shinsooyong deeplearningapproachtodetectionofcolonoscopicinformationfromunstructuredreports AT yibyoungkee deeplearningapproachtodetectionofcolonoscopicinformationfromunstructuredreports

Deep learning approach to detection of colonoscopic information from unstructured reports

Ejemplares similares