Cargando…
An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808011/ https://www.ncbi.nlm.nih.gov/pubmed/36605108 http://dx.doi.org/10.1016/j.jpi.2022.100154 |
_version_ | 1784862840509693952 |
---|---|
author | Lam, Hansen Nguyen, Freddy Wang, Xintong Stock, Aryeh Lenskaya, Volha Kooshesh, Maryam Li, Peizi Qazi, Mohammad Wang, Shenyu Dehghan, Mitra Qian, Xia Si, Qiusheng Polydorides, Alexandros D. |
author_facet | Lam, Hansen Nguyen, Freddy Wang, Xintong Stock, Aryeh Lenskaya, Volha Kooshesh, Maryam Li, Peizi Qazi, Mohammad Wang, Shenyu Dehghan, Mitra Qian, Xia Si, Qiusheng Polydorides, Alexandros D. |
author_sort | Lam, Hansen |
collection | PubMed |
description | CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and evaluate a simple, dictionary- and rule-based natural language processing (NLP) algorithm for generating searchable information on various types of parameters from diverse surgical pathology reports. DESIGN: Data were exported from the pathology laboratory information system (LIS) into extensible markup language (XML) documents, which were parsed by NLP-based Python code into desired data points and delivered to Excel spreadsheets. Accuracy and efficiency were compared to a manual data extraction method with concordance measured by Cohen’s κ coefficient and corresponding P values. RESULTS: The automated method was highly concordant (90%–100%, P<.001) with excellent inter-observer reliability (Cohen’s κ: 0.86–1.0) compared to the manual method in 3 clinicopathological research scenarios, including squamous dysplasia presence and grade in anal biopsies, epithelial dysplasia grade and location in colonoscopic surveillance biopsies, and adenocarcinoma grade and amount in prostate core biopsies. Significantly, the automated method was 24–39 times faster and inherently contained links for each diagnosis to additional variables such as patient age, location, etc., which would require additional manual processing time. CONCLUSIONS: A simple, flexible, and scaleable NLP-based platform can be used to correctly, safely, and quickly extract and deliver linked data from pathology reports into searchable spreadsheets for clinical and research purposes. |
format | Online Article Text |
id | pubmed-9808011 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-98080112023-01-04 An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports Lam, Hansen Nguyen, Freddy Wang, Xintong Stock, Aryeh Lenskaya, Volha Kooshesh, Maryam Li, Peizi Qazi, Mohammad Wang, Shenyu Dehghan, Mitra Qian, Xia Si, Qiusheng Polydorides, Alexandros D. J Pathol Inform Original Research Article CONTEXT: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. OBJECTIVE: We sought to develop, employ, and evaluate a simple, dictionary- and rule-based natural language processing (NLP) algorithm for generating searchable information on various types of parameters from diverse surgical pathology reports. DESIGN: Data were exported from the pathology laboratory information system (LIS) into extensible markup language (XML) documents, which were parsed by NLP-based Python code into desired data points and delivered to Excel spreadsheets. Accuracy and efficiency were compared to a manual data extraction method with concordance measured by Cohen’s κ coefficient and corresponding P values. RESULTS: The automated method was highly concordant (90%–100%, P<.001) with excellent inter-observer reliability (Cohen’s κ: 0.86–1.0) compared to the manual method in 3 clinicopathological research scenarios, including squamous dysplasia presence and grade in anal biopsies, epithelial dysplasia grade and location in colonoscopic surveillance biopsies, and adenocarcinoma grade and amount in prostate core biopsies. Significantly, the automated method was 24–39 times faster and inherently contained links for each diagnosis to additional variables such as patient age, location, etc., which would require additional manual processing time. CONCLUSIONS: A simple, flexible, and scaleable NLP-based platform can be used to correctly, safely, and quickly extract and deliver linked data from pathology reports into searchable spreadsheets for clinical and research purposes. Elsevier 2022-11-08 /pmc/articles/PMC9808011/ /pubmed/36605108 http://dx.doi.org/10.1016/j.jpi.2022.100154 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Original Research Article Lam, Hansen Nguyen, Freddy Wang, Xintong Stock, Aryeh Lenskaya, Volha Kooshesh, Maryam Li, Peizi Qazi, Mohammad Wang, Shenyu Dehghan, Mitra Qian, Xia Si, Qiusheng Polydorides, Alexandros D. An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
title | An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
title_full | An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
title_fullStr | An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
title_full_unstemmed | An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
title_short | An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
title_sort | accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports |
topic | Original Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808011/ https://www.ncbi.nlm.nih.gov/pubmed/36605108 http://dx.doi.org/10.1016/j.jpi.2022.100154 |
work_keys_str_mv | AT lamhansen anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT nguyenfreddy anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT wangxintong anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT stockaryeh anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT lenskayavolha anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT koosheshmaryam anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT lipeizi anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT qazimohammad anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT wangshenyu anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT dehghanmitra anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT qianxia anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT siqiusheng anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT polydoridesalexandrosd anaccessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT lamhansen accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT nguyenfreddy accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT wangxintong accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT stockaryeh accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT lenskayavolha accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT koosheshmaryam accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT lipeizi accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT qazimohammad accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT wangshenyu accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT dehghanmitra accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT qianxia accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT siqiusheng accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports AT polydoridesalexandrosd accessibleefficientandaccuratenaturallanguageprocessingmethodforextractingdiagnosticdatafrompathologyreports |