Cargando…

Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data

BACKGROUND: Soft tissue sarcoma is a rare and highly heterogeneous tumor in clinical practice. Pathological grading of the soft tissue sarcoma is a key factor in patient prognosis and treatment planning while the clinical data of soft tissue sarcoma are imbalanced. In this paper, we propose an effec...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Xuanxuan, Guo, Li, Wang, Hexiang, Guo, Jia, Yang, Shifeng, Duan, Lisha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417078/
https://www.ncbi.nlm.nih.gov/pubmed/36028803
http://dx.doi.org/10.1186/s12880-022-00876-5
_version_ 1784776625652498432
author Liu, Xuanxuan
Guo, Li
Wang, Hexiang
Guo, Jia
Yang, Shifeng
Duan, Lisha
author_facet Liu, Xuanxuan
Guo, Li
Wang, Hexiang
Guo, Jia
Yang, Shifeng
Duan, Lisha
author_sort Liu, Xuanxuan
collection PubMed
description BACKGROUND: Soft tissue sarcoma is a rare and highly heterogeneous tumor in clinical practice. Pathological grading of the soft tissue sarcoma is a key factor in patient prognosis and treatment planning while the clinical data of soft tissue sarcoma are imbalanced. In this paper, we propose an effective solution to find the optimal imbalance machine learning model for predicting the classification of soft tissue sarcoma data. METHODS: In this paper, a large number of features are first obtained based on [Formula: see text] WI images using the radiomics methods.Then, we explore the methods of feature selection, sampling and classification, get 17 imbalance machine learning models based on the above features and performed extensive experiments to classify imbalanced soft tissue sarcoma data. Meanwhile, we used another dataset splitting method as well, which could improve the classification performance and verify the validity of the models. RESULTS: The experimental results show that the combination of extremely randomized trees (ERT) classification algorithm using SMOTETomek and the recursive feature elimination technique (RFE) performs best compared to other methods. The accuracy of RFE+STT+ERT is 81.57% , which is close to the accuracy of biopsy, and the accuracy is 95.69% when using another dataset splitting method. CONCLUSION: Preoperative predicting pathological grade of soft tissue sarcoma in an accurate and noninvasive manner is essential. Our proposed machine learning method (RFE+STT+ERT) can make a positive contribution to solving the imbalanced data classification problem, which can favorably support the development of personalized treatment plans for soft tissue sarcoma patients.
format Online
Article
Text
id pubmed-9417078
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94170782022-08-28 Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data Liu, Xuanxuan Guo, Li Wang, Hexiang Guo, Jia Yang, Shifeng Duan, Lisha BMC Med Imaging Research BACKGROUND: Soft tissue sarcoma is a rare and highly heterogeneous tumor in clinical practice. Pathological grading of the soft tissue sarcoma is a key factor in patient prognosis and treatment planning while the clinical data of soft tissue sarcoma are imbalanced. In this paper, we propose an effective solution to find the optimal imbalance machine learning model for predicting the classification of soft tissue sarcoma data. METHODS: In this paper, a large number of features are first obtained based on [Formula: see text] WI images using the radiomics methods.Then, we explore the methods of feature selection, sampling and classification, get 17 imbalance machine learning models based on the above features and performed extensive experiments to classify imbalanced soft tissue sarcoma data. Meanwhile, we used another dataset splitting method as well, which could improve the classification performance and verify the validity of the models. RESULTS: The experimental results show that the combination of extremely randomized trees (ERT) classification algorithm using SMOTETomek and the recursive feature elimination technique (RFE) performs best compared to other methods. The accuracy of RFE+STT+ERT is 81.57% , which is close to the accuracy of biopsy, and the accuracy is 95.69% when using another dataset splitting method. CONCLUSION: Preoperative predicting pathological grade of soft tissue sarcoma in an accurate and noninvasive manner is essential. Our proposed machine learning method (RFE+STT+ERT) can make a positive contribution to solving the imbalanced data classification problem, which can favorably support the development of personalized treatment plans for soft tissue sarcoma patients. BioMed Central 2022-08-26 /pmc/articles/PMC9417078/ /pubmed/36028803 http://dx.doi.org/10.1186/s12880-022-00876-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Xuanxuan
Guo, Li
Wang, Hexiang
Guo, Jia
Yang, Shifeng
Duan, Lisha
Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data
title Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data
title_full Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data
title_fullStr Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data
title_full_unstemmed Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data
title_short Research on imbalance machine learning methods for MR[Formula: see text] WI soft tissue sarcoma data
title_sort research on imbalance machine learning methods for mr[formula: see text] wi soft tissue sarcoma data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417078/
https://www.ncbi.nlm.nih.gov/pubmed/36028803
http://dx.doi.org/10.1186/s12880-022-00876-5
work_keys_str_mv AT liuxuanxuan researchonimbalancemachinelearningmethodsformrformulaseetextwisofttissuesarcomadata
AT guoli researchonimbalancemachinelearningmethodsformrformulaseetextwisofttissuesarcomadata
AT wanghexiang researchonimbalancemachinelearningmethodsformrformulaseetextwisofttissuesarcomadata
AT guojia researchonimbalancemachinelearningmethodsformrformulaseetextwisofttissuesarcomadata
AT yangshifeng researchonimbalancemachinelearningmethodsformrformulaseetextwisofttissuesarcomadata
AT duanlisha researchonimbalancemachinelearningmethodsformrformulaseetextwisofttissuesarcomadata