Cargando…

Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers

PURPOSE: Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algo...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Bridie S., Hardy, Sam, Pandeya, Nirmala, Dusingize, Jean Claude, Green, Adele C., Millane, Athon, Bourke, Daniel, Grande, Ronald, Bean, Cameron D., Olsen, Catherine M., Whiteman, David C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Clinical Oncology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7469600/
https://www.ncbi.nlm.nih.gov/pubmed/32755460
http://dx.doi.org/10.1200/CCI.19.00152
_version_ 1783578438239518720
author Thompson, Bridie S.
Hardy, Sam
Pandeya, Nirmala
Dusingize, Jean Claude
Green, Adele C.
Millane, Athon
Bourke, Daniel
Grande, Ronald
Bean, Cameron D.
Olsen, Catherine M.
Whiteman, David C.
author_facet Thompson, Bridie S.
Hardy, Sam
Pandeya, Nirmala
Dusingize, Jean Claude
Green, Adele C.
Millane, Athon
Bourke, Daniel
Grande, Ronald
Bean, Cameron D.
Olsen, Catherine M.
Whiteman, David C.
author_sort Thompson, Bridie S.
collection PubMed
description PURPOSE: Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algorithms for classifying basal and squamous cell carcinomas and other diagnoses, as well as disease site, and incorporate these into a Web application capable of processing large numbers of pathology reports. METHODS: Participants in the QSkin study were recruited in 2011 and comprised men and women age 40-69 years at baseline (N = 43,794) who were randomly selected from a population register in Queensland, Australia. Histologic data were manually extracted from free-text pathology reports for participants with histologically confirmed keratinocyte cancers for whom a pathology report was available (n = 25,786 reports). This provided a training data set for the development of algorithms capable of deriving diagnosis and site from free-text pathology reports. We calculated agreement statistics between algorithm-derived classifications and 3 independent validation data sets of manually abstracted pathology reports. RESULTS: The agreement for classifications of basal cell carcinoma (κ = 0.97 and κ = 0.96) and squamous cell carcinoma (κ = 0.93 for both) was almost perfect in 2 validation data sets but was slightly lower for a third (κ = 0.82 and κ = 0.90, respectively). Agreement for total counts of specific diagnoses was also high (κ > 0.8). Similar levels of agreement between algorithm-derived and manually extracted data were observed for classifications of keratoacanthoma and intraepidermal carcinoma. CONCLUSION: Supervised learning methods were used to develop a Web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. Such tools may provide the means to accurately measure subtype-specific skin cancer incidence.
format Online
Article
Text
id pubmed-7469600
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society of Clinical Oncology
record_format MEDLINE/PubMed
spelling pubmed-74696002021-08-05 Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers Thompson, Bridie S. Hardy, Sam Pandeya, Nirmala Dusingize, Jean Claude Green, Adele C. Millane, Athon Bourke, Daniel Grande, Ronald Bean, Cameron D. Olsen, Catherine M. Whiteman, David C. JCO Clin Cancer Inform Original Reports PURPOSE: Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algorithms for classifying basal and squamous cell carcinomas and other diagnoses, as well as disease site, and incorporate these into a Web application capable of processing large numbers of pathology reports. METHODS: Participants in the QSkin study were recruited in 2011 and comprised men and women age 40-69 years at baseline (N = 43,794) who were randomly selected from a population register in Queensland, Australia. Histologic data were manually extracted from free-text pathology reports for participants with histologically confirmed keratinocyte cancers for whom a pathology report was available (n = 25,786 reports). This provided a training data set for the development of algorithms capable of deriving diagnosis and site from free-text pathology reports. We calculated agreement statistics between algorithm-derived classifications and 3 independent validation data sets of manually abstracted pathology reports. RESULTS: The agreement for classifications of basal cell carcinoma (κ = 0.97 and κ = 0.96) and squamous cell carcinoma (κ = 0.93 for both) was almost perfect in 2 validation data sets but was slightly lower for a third (κ = 0.82 and κ = 0.90, respectively). Agreement for total counts of specific diagnoses was also high (κ > 0.8). Similar levels of agreement between algorithm-derived and manually extracted data were observed for classifications of keratoacanthoma and intraepidermal carcinoma. CONCLUSION: Supervised learning methods were used to develop a Web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. Such tools may provide the means to accurately measure subtype-specific skin cancer incidence. American Society of Clinical Oncology 2020-08-05 /pmc/articles/PMC7469600/ /pubmed/32755460 http://dx.doi.org/10.1200/CCI.19.00152 Text en © 2020 by American Society of Clinical Oncology https://creativecommons.org/licenses/by-nc-nd/4.0/ Creative Commons Attribution Non-Commercial No Derivatives 4.0 License: https://creativecommons.org/licenses/by-nc-nd/4.0/
spellingShingle Original Reports
Thompson, Bridie S.
Hardy, Sam
Pandeya, Nirmala
Dusingize, Jean Claude
Green, Adele C.
Millane, Athon
Bourke, Daniel
Grande, Ronald
Bean, Cameron D.
Olsen, Catherine M.
Whiteman, David C.
Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers
title Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers
title_full Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers
title_fullStr Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers
title_full_unstemmed Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers
title_short Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers
title_sort web application for the automated extraction of diagnosis and site from pathology reports for keratinocyte cancers
topic Original Reports
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7469600/
https://www.ncbi.nlm.nih.gov/pubmed/32755460
http://dx.doi.org/10.1200/CCI.19.00152
work_keys_str_mv AT thompsonbridies webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT hardysam webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT pandeyanirmala webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT dusingizejeanclaude webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT greenadelec webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT millaneathon webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT bourkedaniel webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT granderonald webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT beancamerond webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT olsencatherinem webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers
AT whitemandavidc webapplicationfortheautomatedextractionofdiagnosisandsitefrompathologyreportsforkeratinocytecancers