Cargando…

Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing

INTRODUCTION: Routinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use natural Language Processing (NLP) techniques to extract detailed clinical and...

Descripción completa

Detalles Bibliográficos
Autores principales: Ali, Stephen R., Strafford, Huw, Dobbs, Thomas D., Fonferko-Shadrach, Beata, Lacey, Arron S., Pickrell, William Owen, Hutchings, Hayley A., Whitaker, Iain S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9683031/
https://www.ncbi.nlm.nih.gov/pubmed/36439548
http://dx.doi.org/10.3389/fsurg.2022.870494
_version_ 1784834983100153856
author Ali, Stephen R.
Strafford, Huw
Dobbs, Thomas D.
Fonferko-Shadrach, Beata
Lacey, Arron S.
Pickrell, William Owen
Hutchings, Hayley A.
Whitaker, Iain S.
author_facet Ali, Stephen R.
Strafford, Huw
Dobbs, Thomas D.
Fonferko-Shadrach, Beata
Lacey, Arron S.
Pickrell, William Owen
Hutchings, Hayley A.
Whitaker, Iain S.
author_sort Ali, Stephen R.
collection PubMed
description INTRODUCTION: Routinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data. METHODS: We used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation, we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised basal cell carcinoma (BCC) histopathological reports from Swansea Bay University Health Board, Wales, UK. The results of our algorithm were compared with gold standard human annotation by two independent and blinded expert clinicians involved in skin cancer care. RESULTS: We identified 11,224 items of information with a mean precision, recall, and F1 score of 86.0% (95% CI: 75.1–96.9), 84.2% (95% CI: 72.8–96.1), and 84.5% (95% CI: 73.0–95.1), respectively. The difference between clinician annotator F1 scores was 7.9% in comparison with 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85. CONCLUSION: Using an NLP rule-based approach for named entity recognition in BCC, we have been able to develop and validate a pipeline with a potential application in improving the quality of cancer registry data, supporting service planning, and enhancing the quality of routinely collected data for research.
format Online
Article
Text
id pubmed-9683031
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-96830312022-11-24 Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing Ali, Stephen R. Strafford, Huw Dobbs, Thomas D. Fonferko-Shadrach, Beata Lacey, Arron S. Pickrell, William Owen Hutchings, Hayley A. Whitaker, Iain S. Front Surg Surgery INTRODUCTION: Routinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data. METHODS: We used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation, we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised basal cell carcinoma (BCC) histopathological reports from Swansea Bay University Health Board, Wales, UK. The results of our algorithm were compared with gold standard human annotation by two independent and blinded expert clinicians involved in skin cancer care. RESULTS: We identified 11,224 items of information with a mean precision, recall, and F1 score of 86.0% (95% CI: 75.1–96.9), 84.2% (95% CI: 72.8–96.1), and 84.5% (95% CI: 73.0–95.1), respectively. The difference between clinician annotator F1 scores was 7.9% in comparison with 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85. CONCLUSION: Using an NLP rule-based approach for named entity recognition in BCC, we have been able to develop and validate a pipeline with a potential application in improving the quality of cancer registry data, supporting service planning, and enhancing the quality of routinely collected data for research. Frontiers Media S.A. 2022-08-24 /pmc/articles/PMC9683031/ /pubmed/36439548 http://dx.doi.org/10.3389/fsurg.2022.870494 Text en © 2022 Ali, Strafford, Dobbs, Fonferko-Shadrach, Lacey, Pickrell, Hutchings and Whitaker. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Surgery
Ali, Stephen R.
Strafford, Huw
Dobbs, Thomas D.
Fonferko-Shadrach, Beata
Lacey, Arron S.
Pickrell, William Owen
Hutchings, Hayley A.
Whitaker, Iain S.
Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_full Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_fullStr Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_full_unstemmed Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_short Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_sort development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
topic Surgery
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9683031/
https://www.ncbi.nlm.nih.gov/pubmed/36439548
http://dx.doi.org/10.3389/fsurg.2022.870494
work_keys_str_mv AT alistephenr developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT straffordhuw developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT dobbsthomasd developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT fonferkoshadrachbeata developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT laceyarrons developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT pickrellwilliamowen developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT hutchingshayleya developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing
AT whitakeriains developmentandvalidationofanautomatedbasalcellcarcinomahistopathologyinformationextractionsystemusingnaturallanguageprocessing