Cargando…

Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data

Background: This study aims to show the impact of imbalanced data and the typical evaluation methods in developing and misleading assessments of machine learning-based models for preoperative thyroid nodules screening. Study design: A retrospective study. Methods: The ultrasonography features for 43...

Descripción completa

Detalles Bibliográficos
Autores principales: Khodabandelu, Sajad, Ghaemian, Naser, Khafri, Soraya, Ezoji, Mehdi, Khaleghi, Sara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hamadan University of Medical Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422153/
https://www.ncbi.nlm.nih.gov/pubmed/36511373
http://dx.doi.org/10.34172/jrhs.2022.90
_version_ 1785089133504364544
author Khodabandelu, Sajad
Ghaemian, Naser
Khafri, Soraya
Ezoji, Mehdi
Khaleghi, Sara
author_facet Khodabandelu, Sajad
Ghaemian, Naser
Khafri, Soraya
Ezoji, Mehdi
Khaleghi, Sara
author_sort Khodabandelu, Sajad
collection PubMed
description Background: This study aims to show the impact of imbalanced data and the typical evaluation methods in developing and misleading assessments of machine learning-based models for preoperative thyroid nodules screening. Study design: A retrospective study. Methods: The ultrasonography features for 431 thyroid nodules cases were extracted from medical records of 313 patients in Babol, Iran. Since thyroid nodules are commonly benign, the relevant data are usually unbalanced in classes. It can lead to the bias of learning models toward the majority class. To solve it, a hybrid resampling method called the Smote-was used to creating balance data. Following that, the support vector classification (SVC) algorithm was trained by balance and unbalanced datasets as Models 2 and 3, respectively, in Python language programming. Their performance was then compared with the logistic regression model as Model 1 that fitted traditionally. Results: The prevalence of malignant nodules was obtained at 14% (n = 61). In addition, 87% of the patients in this study were women. However, there was no difference in the prevalence of malignancy for gender. Furthermore, the accuracy, area under the curve, and geometric mean values were estimated at 92.1%, 93.2%, and 76.8% for Model 1, 91.3%, 93%, and 77.6% for Model 2, and finally, 91%, 92.6% and 84.2% for Model 3, respectively. Similarly, the results identified Micro calcification, Taller than wide shape, as well as lack of ISO and hyperechogenicity features as the most effective malignant variables. Conclusion: Paying attention to data challenges, such as data imbalances, and using proper criteria measures can improve the performance of machine learning models for preoperative thyroid nodules screening.
format Online
Article
Text
id pubmed-10422153
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hamadan University of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-104221532023-08-13 Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data Khodabandelu, Sajad Ghaemian, Naser Khafri, Soraya Ezoji, Mehdi Khaleghi, Sara J Res Health Sci Original Article Background: This study aims to show the impact of imbalanced data and the typical evaluation methods in developing and misleading assessments of machine learning-based models for preoperative thyroid nodules screening. Study design: A retrospective study. Methods: The ultrasonography features for 431 thyroid nodules cases were extracted from medical records of 313 patients in Babol, Iran. Since thyroid nodules are commonly benign, the relevant data are usually unbalanced in classes. It can lead to the bias of learning models toward the majority class. To solve it, a hybrid resampling method called the Smote-was used to creating balance data. Following that, the support vector classification (SVC) algorithm was trained by balance and unbalanced datasets as Models 2 and 3, respectively, in Python language programming. Their performance was then compared with the logistic regression model as Model 1 that fitted traditionally. Results: The prevalence of malignant nodules was obtained at 14% (n = 61). In addition, 87% of the patients in this study were women. However, there was no difference in the prevalence of malignancy for gender. Furthermore, the accuracy, area under the curve, and geometric mean values were estimated at 92.1%, 93.2%, and 76.8% for Model 1, 91.3%, 93%, and 77.6% for Model 2, and finally, 91%, 92.6% and 84.2% for Model 3, respectively. Similarly, the results identified Micro calcification, Taller than wide shape, as well as lack of ISO and hyperechogenicity features as the most effective malignant variables. Conclusion: Paying attention to data challenges, such as data imbalances, and using proper criteria measures can improve the performance of machine learning models for preoperative thyroid nodules screening. Hamadan University of Medical Sciences 2022-08-29 /pmc/articles/PMC10422153/ /pubmed/36511373 http://dx.doi.org/10.34172/jrhs.2022.90 Text en © 2022 The Author(s); Published by Hamadan University of Medical Sciences. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Khodabandelu, Sajad
Ghaemian, Naser
Khafri, Soraya
Ezoji, Mehdi
Khaleghi, Sara
Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
title Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
title_full Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
title_fullStr Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
title_full_unstemmed Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
title_short Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data
title_sort development of a machine learning-based screening method for thyroid nodules classification by solving the imbalance challenge in thyroid nodules data
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10422153/
https://www.ncbi.nlm.nih.gov/pubmed/36511373
http://dx.doi.org/10.34172/jrhs.2022.90
work_keys_str_mv AT khodabandelusajad developmentofamachinelearningbasedscreeningmethodforthyroidnodulesclassificationbysolvingtheimbalancechallengeinthyroidnodulesdata
AT ghaemiannaser developmentofamachinelearningbasedscreeningmethodforthyroidnodulesclassificationbysolvingtheimbalancechallengeinthyroidnodulesdata
AT khafrisoraya developmentofamachinelearningbasedscreeningmethodforthyroidnodulesclassificationbysolvingtheimbalancechallengeinthyroidnodulesdata
AT ezojimehdi developmentofamachinelearningbasedscreeningmethodforthyroidnodulesclassificationbysolvingtheimbalancechallengeinthyroidnodulesdata
AT khaleghisara developmentofamachinelearningbasedscreeningmethodforthyroidnodulesclassificationbysolvingtheimbalancechallengeinthyroidnodulesdata