Cargando…

An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports

CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and...

Descripción completa

Detalles Bibliográficos
Autores principales: Lam, Hansen, Nguyen, Freddy, Wang, Xintong, Stock, Aryeh, Lenskaya, Volha, Kooshesh, Maryam, Li, Peizi, Qazi, Mohammad, Wang, Shenyu, Dehghan, Mitra, Qian, Xia, Si, Qiusheng, Polydorides, Alexandros D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808011/
https://www.ncbi.nlm.nih.gov/pubmed/36605108
http://dx.doi.org/10.1016/j.jpi.2022.100154
_version_ 1784862840509693952
author Lam, Hansen
Nguyen, Freddy
Wang, Xintong
Stock, Aryeh
Lenskaya, Volha
Kooshesh, Maryam
Li, Peizi
Qazi, Mohammad
Wang, Shenyu
Dehghan, Mitra
Qian, Xia
Si, Qiusheng
Polydorides, Alexandros D.
author_facet Lam, Hansen
Nguyen, Freddy
Wang, Xintong
Stock, Aryeh
Lenskaya, Volha
Kooshesh, Maryam
Li, Peizi
Qazi, Mohammad
Wang, Shenyu
Dehghan, Mitra
Qian, Xia
Si, Qiusheng
Polydorides, Alexandros D.
author_sort Lam, Hansen
collection PubMed
description CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and evaluate a simple, dictionary- and rule-based natural language processing (NLP) algorithm for generating searchable information on various types of parameters from diverse surgical pathology reports. DESIGN: Data were exported from the pathology laboratory information system (LIS) into extensible markup language (XML) documents, which were parsed by NLP-based Python code into desired data points and delivered to Excel spreadsheets. Accuracy and efficiency were compared to a manual data extraction method with concordance measured by Cohen’s κ coefficient and corresponding P values. RESULTS: The automated method was highly concordant (90%–100%, P<.001) with excellent inter-observer reliability (Cohen’s κ: 0.86–1.0) compared to the manual method in 3 clinicopathological research scenarios, including squamous dysplasia presence and grade in anal biopsies, epithelial dysplasia grade and location in colonoscopic surveillance biopsies, and adenocarcinoma grade and amount in prostate core biopsies. Significantly, the automated method was 24–39 times faster and inherently contained links for each diagnosis to additional variables such as patient age, location, etc., which would require additional manual processing time. CONCLUSIONS: A simple, flexible, and scaleable NLP-based platform can be used to correctly, safely, and quickly extract and deliver linked data from pathology reports into searchable spreadsheets for clinical and research purposes.
format Online
Article
Text
id pubmed-9808011
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-98080112023-01-04 An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports Lam, Hansen Nguyen, Freddy Wang, Xintong Stock, Aryeh Lenskaya, Volha Kooshesh, Maryam Li, Peizi Qazi, Mohammad Wang, Shenyu Dehghan, Mitra Qian, Xia Si, Qiusheng Polydorides, Alexandros D. J Pathol Inform Original Research Article CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and evaluate a simple, dictionary- and rule-based natural language processing (NLP) algorithm for generating searchable information on various types of parameters from diverse surgical pathology reports. DESIGN: Data were exported from the pathology laboratory information system (LIS) into extensible markup language (XML) documents, which were parsed by NLP-based Python code into desired data points and delivered to Excel spreadsheets. Accuracy and efficiency were compared to a manual data extraction method with concordance measured by Cohen’s κ coefficient and corresponding P values. RESULTS: The automated method was highly concordant (90%–100%, P<.001) with excellent inter-observer reliability (Cohen’s κ: 0.86–1.0) compared to the manual method in 3 clinicopathological research scenarios, including squamous dysplasia presence and grade in anal biopsies, epithelial dysplasia grade and location in colonoscopic surveillance biopsies, and adenocarcinoma grade and amount in prostate core biopsies. Significantly, the automated method was 24–39 times faster and inherently contained links for each diagnosis to additional variables such as patient age, location, etc., which would require additional manual processing time. CONCLUSIONS: A simple, flexible, and scaleable NLP-based platform can be used to correctly, safely, and quickly extract and deliver linked data from pathology reports into searchable spreadsheets for clinical and research purposes. Elsevier 2022-11-08 /pmc/articles/PMC9808011/ /pubmed/36605108 http://dx.doi.org/10.1016/j.jpi.2022.100154 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Research Article
Lam, Hansen
Nguyen, Freddy
Wang, Xintong
Stock, Aryeh
Lenskaya, Volha
Kooshesh, Maryam
Li, Peizi
Qazi, Mohammad
Wang, Shenyu
Dehghan, Mitra
Qian, Xia
Si, Qiusheng
Polydorides, Alexandros D.
An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
title An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
title_full An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
title_fullStr An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
title_full_unstemmed An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
title_short An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
title_sort accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808011/
https://www.ncbi.nlm.nih.gov/pubmed/36605108
http://dx.doi.org/10.1016/j.jpi.2022.100154
work_keys_str_mv AT lamhansen anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT nguyenfreddy anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT wangxintong anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT stockaryeh anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT lenskayavolha anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT koosheshmaryam anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT lipeizi anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT qazimohammad anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT wangshenyu anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT dehghanmitra anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT qianxia anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT siqiusheng anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT polydoridesalexandrosd anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT lamhansen accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT nguyenfreddy accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT wangxintong accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT stockaryeh accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT lenskayavolha accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT koosheshmaryam accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT lipeizi accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT qazimohammad accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT wangshenyu accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT dehghanmitra accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT qianxia accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT siqiusheng accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports
AT polydoridesalexandrosd accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports