Cargando…

Automatic information extraction from childhood cancer pathology reports

OBJECTIVES: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoon, Hong-Jun, Peluso, Alina, Durbin, Eric B, Wu, Xiao-Cheng, Stroup, Antoinette, Doherty, Jennifer, Schwartz, Stephen, Wiggins, Charles, Coyle, Linda, Penberthy, Lynne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202570/
https://www.ncbi.nlm.nih.gov/pubmed/35721398
http://dx.doi.org/10.1093/jamiaopen/ooac049
_version_ 1784728557299171328
author Yoon, Hong-Jun
Peluso, Alina
Durbin, Eric B
Wu, Xiao-Cheng
Stroup, Antoinette
Doherty, Jennifer
Schwartz, Stephen
Wiggins, Charles
Coyle, Linda
Penberthy, Lynne
author_facet Yoon, Hong-Jun
Peluso, Alina
Durbin, Eric B
Wu, Xiao-Cheng
Stroup, Antoinette
Doherty, Jennifer
Schwartz, Stephen
Wiggins, Charles
Coyle, Linda
Penberthy, Lynne
author_sort Yoon, Hong-Jun
collection PubMed
description OBJECTIVES: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. MATERIALS AND METHODS: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. RESULTS: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. CONCLUSIONS: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably.
format Online
Article
Text
id pubmed-9202570
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92025702022-06-17 Automatic information extraction from childhood cancer pathology reports Yoon, Hong-Jun Peluso, Alina Durbin, Eric B Wu, Xiao-Cheng Stroup, Antoinette Doherty, Jennifer Schwartz, Stephen Wiggins, Charles Coyle, Linda Penberthy, Lynne JAMIA Open Research and Applications OBJECTIVES: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. MATERIALS AND METHODS: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. RESULTS: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. CONCLUSIONS: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably. Oxford University Press 2022-06-16 /pmc/articles/PMC9202570/ /pubmed/35721398 http://dx.doi.org/10.1093/jamiaopen/ooac049 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Yoon, Hong-Jun
Peluso, Alina
Durbin, Eric B
Wu, Xiao-Cheng
Stroup, Antoinette
Doherty, Jennifer
Schwartz, Stephen
Wiggins, Charles
Coyle, Linda
Penberthy, Lynne
Automatic information extraction from childhood cancer pathology reports
title Automatic information extraction from childhood cancer pathology reports
title_full Automatic information extraction from childhood cancer pathology reports
title_fullStr Automatic information extraction from childhood cancer pathology reports
title_full_unstemmed Automatic information extraction from childhood cancer pathology reports
title_short Automatic information extraction from childhood cancer pathology reports
title_sort automatic information extraction from childhood cancer pathology reports
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202570/
https://www.ncbi.nlm.nih.gov/pubmed/35721398
http://dx.doi.org/10.1093/jamiaopen/ooac049
work_keys_str_mv AT yoonhongjun automaticinformationextractionfromchildhoodcancerpathologyreports
AT pelusoalina automaticinformationextractionfromchildhoodcancerpathologyreports
AT durbinericb automaticinformationextractionfromchildhoodcancerpathologyreports
AT wuxiaocheng automaticinformationextractionfromchildhoodcancerpathologyreports
AT stroupantoinette automaticinformationextractionfromchildhoodcancerpathologyreports
AT dohertyjennifer automaticinformationextractionfromchildhoodcancerpathologyreports
AT schwartzstephen automaticinformationextractionfromchildhoodcancerpathologyreports
AT wigginscharles automaticinformationextractionfromchildhoodcancerpathologyreports
AT coylelinda automaticinformationextractionfromchildhoodcancerpathologyreports
AT penberthylynne automaticinformationextractionfromchildhoodcancerpathologyreports