Cargando…

Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names

The lack of standardized structure names in radiotherapy (RT) data limits interoperability, data sharing, and the ability to perform big data analysis. To standardize radiotherapy structure names, we developed an integrated natural language processing (NLP) and machine learning (ML) based system tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Syed, Khajamoinuddin, Sleeman IV, William, Ivey, Kevin, Hagan, Michael, Palta, Jatinder, Kapoor, Rishabh, Ghosh, Preetam
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7348919/ https://www.ncbi.nlm.nih.gov/pubmed/32365973 http://dx.doi.org/10.3390/healthcare8020120

_version_	1783556943281913856
author	Syed, Khajamoinuddin Sleeman IV, William Ivey, Kevin Hagan, Michael Palta, Jatinder Kapoor, Rishabh Ghosh, Preetam
author_facet	Syed, Khajamoinuddin Sleeman IV, William Ivey, Kevin Hagan, Michael Palta, Jatinder Kapoor, Rishabh Ghosh, Preetam
author_sort	Syed, Khajamoinuddin
collection	PubMed
description	The lack of standardized structure names in radiotherapy (RT) data limits interoperability, data sharing, and the ability to perform big data analysis. To standardize radiotherapy structure names, we developed an integrated natural language processing (NLP) and machine learning (ML) based system that can map the physician-given structure names to American Association of Physicists in Medicine (AAPM) Task Group 263 (TG-263) standard names. The dataset consist of 794 prostate and 754 lung cancer patients across the 40 different radiation therapy centers managed by the Veterans Health Administration (VA). Additionally, data from the Radiation Oncology department at Virginia Commonwealth University (VCU) was collected to serve as a test set. Domain experts identified as anatomically significant nine prostate and ten lung organs-at-risk (OAR) structures and manually labeled them according to the TG-263 standards, and remaining structures were labeled as Non_OAR. We experimented with six different classification algorithms and three feature vector methods, and the final model was built with fastText algorithm. Multiple validation techniques are used to assess the robustness of the proposed methodology. The macro-averaged F(1) score was used as the main evaluation metric. The model achieved an F(1) score of 0.97 on prostate structures and 0.99 for lung structures from the VA dataset. The model also performed well on the test (VCU) dataset, achieving an F(1) score of 0.93 for prostate structures and 0.95 on lung structures. In this work, we demonstrate that NLP and ML based approaches can used to standardize the physician-given RT structure names with high fidelity. This standardization can help with big data analytics in the radiation therapy domain using population-derived datasets, including standardization of the treatment planning process, clinical decision support systems, treatment quality improvement programs, and hypothesis-driven clinical research.
format	Online Article Text
id	pubmed-7348919
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-73489192020-07-22 Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names Syed, Khajamoinuddin Sleeman IV, William Ivey, Kevin Hagan, Michael Palta, Jatinder Kapoor, Rishabh Ghosh, Preetam Healthcare (Basel) Article The lack of standardized structure names in radiotherapy (RT) data limits interoperability, data sharing, and the ability to perform big data analysis. To standardize radiotherapy structure names, we developed an integrated natural language processing (NLP) and machine learning (ML) based system that can map the physician-given structure names to American Association of Physicists in Medicine (AAPM) Task Group 263 (TG-263) standard names. The dataset consist of 794 prostate and 754 lung cancer patients across the 40 different radiation therapy centers managed by the Veterans Health Administration (VA). Additionally, data from the Radiation Oncology department at Virginia Commonwealth University (VCU) was collected to serve as a test set. Domain experts identified as anatomically significant nine prostate and ten lung organs-at-risk (OAR) structures and manually labeled them according to the TG-263 standards, and remaining structures were labeled as Non_OAR. We experimented with six different classification algorithms and three feature vector methods, and the final model was built with fastText algorithm. Multiple validation techniques are used to assess the robustness of the proposed methodology. The macro-averaged F(1) score was used as the main evaluation metric. The model achieved an F(1) score of 0.97 on prostate structures and 0.99 for lung structures from the VA dataset. The model also performed well on the test (VCU) dataset, achieving an F(1) score of 0.93 for prostate structures and 0.95 on lung structures. In this work, we demonstrate that NLP and ML based approaches can used to standardize the physician-given RT structure names with high fidelity. This standardization can help with big data analytics in the radiation therapy domain using population-derived datasets, including standardization of the treatment planning process, clinical decision support systems, treatment quality improvement programs, and hypothesis-driven clinical research. MDPI 2020-04-30 /pmc/articles/PMC7348919/ /pubmed/32365973 http://dx.doi.org/10.3390/healthcare8020120 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Syed, Khajamoinuddin Sleeman IV, William Ivey, Kevin Hagan, Michael Palta, Jatinder Kapoor, Rishabh Ghosh, Preetam Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names
title	Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names
title_full	Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names
title_fullStr	Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names
title_full_unstemmed	Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names
title_short	Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names
title_sort	integrated natural language processing and machine learning models for standardizing radiotherapy structure names
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7348919/ https://www.ncbi.nlm.nih.gov/pubmed/32365973 http://dx.doi.org/10.3390/healthcare8020120
work_keys_str_mv	AT syedkhajamoinuddin integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames AT sleemanivwilliam integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames AT iveykevin integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames AT haganmichael integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames AT paltajatinder integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames AT kapoorrishabh integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames AT ghoshpreetam integratednaturallanguageprocessingandmachinelearningmodelsforstandardizingradiotherapystructurenames

Integrated Natural Language Processing and Machine Learning Models for Standardizing Radiotherapy Structure Names

Ejemplares similares