Cargando…

Neural Network Assisted Pathology Case Identification

BACKGROUND: Traditionally, cases for cohort selection and quality assurance purposes are identified through structured query language (SQL) searches matching specific keywords. Recently, several neural network-based natural language processing (NLP) pipelines have emerged as an accurate alternative/...

Descripción completa

Detalles Bibliográficos
Autor principal: Cheng, Jerome
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8860736/
https://www.ncbi.nlm.nih.gov/pubmed/35242447
http://dx.doi.org/10.1016/j.jpi.2022.100008
_version_ 1784654738209374208
author Cheng, Jerome
author_facet Cheng, Jerome
author_sort Cheng, Jerome
collection PubMed
description BACKGROUND: Traditionally, cases for cohort selection and quality assurance purposes are identified through structured query language (SQL) searches matching specific keywords. Recently, several neural network-based natural language processing (NLP) pipelines have emerged as an accurate alternative/complementary method for case retrieval. METHODS: The diagnosis section of 1000 pathology reports with the terms “colon” and “carcinoma” were retrieved from our laboratory information system through a SQL query. Each of the reports were labeled as either positive or negative, where cases are considered positive if the case was a primary adenocarcinoma of the colon. Negative cases comprised adenocarcinoma from other sites, metastatic adenocarcinomas, benign conditions, rectal cancers, and other cases that do not fit in the primary colonic adenocarcinoma category. The 1000 cases were randomly separated into training, validation, and holdout sets. A convolutional neural network (CNN) model built using Keras (a neural network library) was trained to identify positive cases, and the model was applied to the holdout set to predict the category for each case. RESULTS: The CNN model classified 141 out of 149 primary colonic adenocarcinoma cases, and 43 out of 51 negative cases correctly, achieving an accuracy of 92% and area under the ROC curve (AUC) of 0.957. CONCLUSION: Trained convolutional neural network models by itself, or as an adjunct to keyword and pattern-based text extraction methods may be used to search for pathology cases of interest with high accuracy.
format Online
Article
Text
id pubmed-8860736
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-88607362022-03-02 Neural Network Assisted Pathology Case Identification Cheng, Jerome J Pathol Inform Short Communication BACKGROUND: Traditionally, cases for cohort selection and quality assurance purposes are identified through structured query language (SQL) searches matching specific keywords. Recently, several neural network-based natural language processing (NLP) pipelines have emerged as an accurate alternative/complementary method for case retrieval. METHODS: The diagnosis section of 1000 pathology reports with the terms “colon” and “carcinoma” were retrieved from our laboratory information system through a SQL query. Each of the reports were labeled as either positive or negative, where cases are considered positive if the case was a primary adenocarcinoma of the colon. Negative cases comprised adenocarcinoma from other sites, metastatic adenocarcinomas, benign conditions, rectal cancers, and other cases that do not fit in the primary colonic adenocarcinoma category. The 1000 cases were randomly separated into training, validation, and holdout sets. A convolutional neural network (CNN) model built using Keras (a neural network library) was trained to identify positive cases, and the model was applied to the holdout set to predict the category for each case. RESULTS: The CNN model classified 141 out of 149 primary colonic adenocarcinoma cases, and 43 out of 51 negative cases correctly, achieving an accuracy of 92% and area under the ROC curve (AUC) of 0.957. CONCLUSION: Trained convolutional neural network models by itself, or as an adjunct to keyword and pattern-based text extraction methods may be used to search for pathology cases of interest with high accuracy. Elsevier 2022-01-20 /pmc/articles/PMC8860736/ /pubmed/35242447 http://dx.doi.org/10.1016/j.jpi.2022.100008 Text en © 2022 The Author https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Short Communication
Cheng, Jerome
Neural Network Assisted Pathology Case Identification
title Neural Network Assisted Pathology Case Identification
title_full Neural Network Assisted Pathology Case Identification
title_fullStr Neural Network Assisted Pathology Case Identification
title_full_unstemmed Neural Network Assisted Pathology Case Identification
title_short Neural Network Assisted Pathology Case Identification
title_sort neural network assisted pathology case identification
topic Short Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8860736/
https://www.ncbi.nlm.nih.gov/pubmed/35242447
http://dx.doi.org/10.1016/j.jpi.2022.100008
work_keys_str_mv AT chengjerome neuralnetworkassistedpathologycaseidentification