Cargando…

Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals

SIMPLE SUMMARY: Radiogenomics enables prediction of the status and prognosis of patients using non-invasively obtained imaging data. Current machine learning (ML) methods used in radiogenomics require huge datasets, which involve the handling of large heterogeneous datasets from multiple cohorts/hos...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kawaguchi, Risa K., Takahashi, Masamichi, Miyake, Mototaka, Kinoshita, Manabu, Takahashi, Satoshi, Ichimura, Koichi, Hamamoto, Ryuji, Narita, Yoshitaka, Sese, Jun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8306149/ https://www.ncbi.nlm.nih.gov/pubmed/34298824 http://dx.doi.org/10.3390/cancers13143611

_version_	1783727740748300288
author	Kawaguchi, Risa K. Takahashi, Masamichi Miyake, Mototaka Kinoshita, Manabu Takahashi, Satoshi Ichimura, Koichi Hamamoto, Ryuji Narita, Yoshitaka Sese, Jun
author_facet	Kawaguchi, Risa K. Takahashi, Masamichi Miyake, Mototaka Kinoshita, Manabu Takahashi, Satoshi Ichimura, Koichi Hamamoto, Ryuji Narita, Yoshitaka Sese, Jun
author_sort	Kawaguchi, Risa K.
collection	PubMed
description	SIMPLE SUMMARY: Radiogenomics enables prediction of the status and prognosis of patients using non-invasively obtained imaging data. Current machine learning (ML) methods used in radiogenomics require huge datasets, which involve the handling of large heterogeneous datasets from multiple cohorts/hospitals. In this study, two different glioma datasets were used to test various ML and image pre-processing methods to confirm whether the models trained on one dataset are universally applicable to other datasets. Our result suggested that the ML method that yielded the highest accuracy in a single dataset was likely to be overfitted. We demonstrated that implementation of standardization and dimension reduction procedures prior to classification, enabled the development of ML methods that are less affected by the multiple cohort difference. We advocate using caution in interpreting the results of radiogenomic studies of the training and testing datasets that are small or mixed, with a view to implementing practical ML methods in radiogenomics. ABSTRACT: Radiogenomics use non-invasively obtained imaging data, such as magnetic resonance imaging (MRI), to predict critical biomarkers of patients. Developing an accurate machine learning (ML) technique for MRI requires data from hundreds of patients, which cannot be gathered from any single local hospital. Hence, a model universally applicable to multiple cohorts/hospitals is required. We applied various ML and image pre-processing procedures on a glioma dataset from The Cancer Image Archive (TCIA, n = 159). The models that showed a high level of accuracy in predicting glioblastoma or WHO Grade II and III glioma using the TCIA dataset, were then tested for the data from the National Cancer Center Hospital, Japan (NCC, n = 166) whether they could maintain similar levels of high accuracy. Results: we confirmed that our ML procedure achieved a level of accuracy (AUROC = 0.904) comparable to that shown previously by the deep-learning methods using TCIA. However, when we directly applied the model to the NCC dataset, its AUROC dropped to 0.383. Introduction of standardization and dimension reduction procedures before classification without re-training improved the prediction accuracy obtained using NCC (0.804) without a loss in prediction accuracy for the TCIA dataset. Furthermore, we confirmed the same tendency in a model for IDH1/2 mutation prediction with standardization and application of dimension reduction that was also applicable to multiple hospitals. Our results demonstrated that overfitting may occur when an ML method providing the highest accuracy in a small training dataset is used for different heterogeneous data sets, and suggested a promising process for developing an ML method applicable to multiple cohorts.
format	Online Article Text
id	pubmed-8306149
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-83061492021-07-25 Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals Kawaguchi, Risa K. Takahashi, Masamichi Miyake, Mototaka Kinoshita, Manabu Takahashi, Satoshi Ichimura, Koichi Hamamoto, Ryuji Narita, Yoshitaka Sese, Jun Cancers (Basel) Article SIMPLE SUMMARY: Radiogenomics enables prediction of the status and prognosis of patients using non-invasively obtained imaging data. Current machine learning (ML) methods used in radiogenomics require huge datasets, which involve the handling of large heterogeneous datasets from multiple cohorts/hospitals. In this study, two different glioma datasets were used to test various ML and image pre-processing methods to confirm whether the models trained on one dataset are universally applicable to other datasets. Our result suggested that the ML method that yielded the highest accuracy in a single dataset was likely to be overfitted. We demonstrated that implementation of standardization and dimension reduction procedures prior to classification, enabled the development of ML methods that are less affected by the multiple cohort difference. We advocate using caution in interpreting the results of radiogenomic studies of the training and testing datasets that are small or mixed, with a view to implementing practical ML methods in radiogenomics. ABSTRACT: Radiogenomics use non-invasively obtained imaging data, such as magnetic resonance imaging (MRI), to predict critical biomarkers of patients. Developing an accurate machine learning (ML) technique for MRI requires data from hundreds of patients, which cannot be gathered from any single local hospital. Hence, a model universally applicable to multiple cohorts/hospitals is required. We applied various ML and image pre-processing procedures on a glioma dataset from The Cancer Image Archive (TCIA, n = 159). The models that showed a high level of accuracy in predicting glioblastoma or WHO Grade II and III glioma using the TCIA dataset, were then tested for the data from the National Cancer Center Hospital, Japan (NCC, n = 166) whether they could maintain similar levels of high accuracy. Results: we confirmed that our ML procedure achieved a level of accuracy (AUROC = 0.904) comparable to that shown previously by the deep-learning methods using TCIA. However, when we directly applied the model to the NCC dataset, its AUROC dropped to 0.383. Introduction of standardization and dimension reduction procedures before classification without re-training improved the prediction accuracy obtained using NCC (0.804) without a loss in prediction accuracy for the TCIA dataset. Furthermore, we confirmed the same tendency in a model for IDH1/2 mutation prediction with standardization and application of dimension reduction that was also applicable to multiple hospitals. Our results demonstrated that overfitting may occur when an ML method providing the highest accuracy in a small training dataset is used for different heterogeneous data sets, and suggested a promising process for developing an ML method applicable to multiple cohorts. MDPI 2021-07-19 /pmc/articles/PMC8306149/ /pubmed/34298824 http://dx.doi.org/10.3390/cancers13143611 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kawaguchi, Risa K. Takahashi, Masamichi Miyake, Mototaka Kinoshita, Manabu Takahashi, Satoshi Ichimura, Koichi Hamamoto, Ryuji Narita, Yoshitaka Sese, Jun Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
title	Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
title_full	Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
title_fullStr	Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
title_full_unstemmed	Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
title_short	Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals
title_sort	assessing versatile machine learning models for glioma radiogenomic studies across hospitals
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8306149/ https://www.ncbi.nlm.nih.gov/pubmed/34298824 http://dx.doi.org/10.3390/cancers13143611
work_keys_str_mv	AT kawaguchirisak assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT takahashimasamichi assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT miyakemototaka assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT kinoshitamanabu assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT takahashisatoshi assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT ichimurakoichi assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT hamamotoryuji assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT naritayoshitaka assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals AT sesejun assessingversatilemachinelearningmodelsforgliomaradiogenomicstudiesacrosshospitals

Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals

Ejemplares similares