Cargando…

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks

OBJECTIVE: We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alawad, Mohammed, Gao, Shang, Qiu, John X, Yoon, Hong Jun, Blair Christian, J, Penberthy, Lynne, Mumphrey, Brent, Wu, Xiao-Cheng, Coyle, Linda, Tourassi, Georgia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7489089/ https://www.ncbi.nlm.nih.gov/pubmed/31710668 http://dx.doi.org/10.1093/jamia/ocz153

_version_	1783581814557769728
author	Alawad, Mohammed Gao, Shang Qiu, John X Yoon, Hong Jun Blair Christian, J Penberthy, Lynne Mumphrey, Brent Wu, Xiao-Cheng Coyle, Linda Tourassi, Georgia
author_facet	Alawad, Mohammed Gao, Shang Qiu, John X Yoon, Hong Jun Blair Christian, J Penberthy, Lynne Mumphrey, Brent Wu, Xiao-Cheng Coyle, Linda Tourassi, Georgia
author_sort	Alawad, Mohammed
collection	PubMed
description	OBJECTIVE: We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency. MATERIALS AND METHODS: Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC). RESULTS: MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN. CONCLUSIONS: The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task–specific model.
format	Online Article Text
id	pubmed-7489089
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-74890892020-09-16 Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks Alawad, Mohammed Gao, Shang Qiu, John X Yoon, Hong Jun Blair Christian, J Penberthy, Lynne Mumphrey, Brent Wu, Xiao-Cheng Coyle, Linda Tourassi, Georgia J Am Med Inform Assoc Research and Applications OBJECTIVE: We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency. MATERIALS AND METHODS: Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC). RESULTS: MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN. CONCLUSIONS: The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task–specific model. Oxford University Press 2019-11-09 /pmc/articles/PMC7489089/ /pubmed/31710668 http://dx.doi.org/10.1093/jamia/ocz153 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Alawad, Mohammed Gao, Shang Qiu, John X Yoon, Hong Jun Blair Christian, J Penberthy, Lynne Mumphrey, Brent Wu, Xiao-Cheng Coyle, Linda Tourassi, Georgia Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
title	Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
title_full	Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
title_fullStr	Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
title_full_unstemmed	Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
title_short	Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
title_sort	automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7489089/ https://www.ncbi.nlm.nih.gov/pubmed/31710668 http://dx.doi.org/10.1093/jamia/ocz153
work_keys_str_mv	AT alawadmohammed automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT gaoshang automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT qiujohnx automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT yoonhongjun automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT blairchristianj automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT penberthylynne automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT mumphreybrent automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT wuxiaocheng automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT coylelinda automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks AT tourassigeorgia automaticextractionofcancerregistryreportableinformationfromfreetextpathologyreportsusingmultitaskconvolutionalneuralnetworks

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks

Ejemplares similares