Cargando…
Automatic information extraction from childhood cancer pathology reports
OBJECTIVES: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202570/ https://www.ncbi.nlm.nih.gov/pubmed/35721398 http://dx.doi.org/10.1093/jamiaopen/ooac049 |
_version_ | 1784728557299171328 |
---|---|
author | Yoon, Hong-Jun Peluso, Alina Durbin, Eric B Wu, Xiao-Cheng Stroup, Antoinette Doherty, Jennifer Schwartz, Stephen Wiggins, Charles Coyle, Linda Penberthy, Lynne |
author_facet | Yoon, Hong-Jun Peluso, Alina Durbin, Eric B Wu, Xiao-Cheng Stroup, Antoinette Doherty, Jennifer Schwartz, Stephen Wiggins, Charles Coyle, Linda Penberthy, Lynne |
author_sort | Yoon, Hong-Jun |
collection | PubMed |
description | OBJECTIVES: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. MATERIALS AND METHODS: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. RESULTS: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. CONCLUSIONS: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably. |
format | Online Article Text |
id | pubmed-9202570 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92025702022-06-17 Automatic information extraction from childhood cancer pathology reports Yoon, Hong-Jun Peluso, Alina Durbin, Eric B Wu, Xiao-Cheng Stroup, Antoinette Doherty, Jennifer Schwartz, Stephen Wiggins, Charles Coyle, Linda Penberthy, Lynne JAMIA Open Research and Applications OBJECTIVES: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. MATERIALS AND METHODS: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. RESULTS: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. CONCLUSIONS: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably. Oxford University Press 2022-06-16 /pmc/articles/PMC9202570/ /pubmed/35721398 http://dx.doi.org/10.1093/jamiaopen/ooac049 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research and Applications Yoon, Hong-Jun Peluso, Alina Durbin, Eric B Wu, Xiao-Cheng Stroup, Antoinette Doherty, Jennifer Schwartz, Stephen Wiggins, Charles Coyle, Linda Penberthy, Lynne Automatic information extraction from childhood cancer pathology reports |
title | Automatic information extraction from childhood cancer pathology reports |
title_full | Automatic information extraction from childhood cancer pathology reports |
title_fullStr | Automatic information extraction from childhood cancer pathology reports |
title_full_unstemmed | Automatic information extraction from childhood cancer pathology reports |
title_short | Automatic information extraction from childhood cancer pathology reports |
title_sort | automatic information extraction from childhood cancer pathology reports |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9202570/ https://www.ncbi.nlm.nih.gov/pubmed/35721398 http://dx.doi.org/10.1093/jamiaopen/ooac049 |
work_keys_str_mv | AT yoonhongjun automaticinformationextractionfromchildhoodcancerpathologyreports AT pelusoalina automaticinformationextractionfromchildhoodcancerpathologyreports AT durbinericb automaticinformationextractionfromchildhoodcancerpathologyreports AT wuxiaocheng automaticinformationextractionfromchildhoodcancerpathologyreports AT stroupantoinette automaticinformationextractionfromchildhoodcancerpathologyreports AT dohertyjennifer automaticinformationextractionfromchildhoodcancerpathologyreports AT schwartzstephen automaticinformationextractionfromchildhoodcancerpathologyreports AT wigginscharles automaticinformationextractionfromchildhoodcancerpathologyreports AT coylelinda automaticinformationextractionfromchildhoodcancerpathologyreports AT penberthylynne automaticinformationextractionfromchildhoodcancerpathologyreports |