Cargando…

Comparative analysis of weka-based classification algorithms on medical diagnosis datasets

BACKGROUND: With the advent of 5G and the era of Big Data, the rapid development of medical information technology around the world, the massive application of electronic medical records and cases, and the digitization of medical equipment and instruments, a large amount of data has accumulated in t...

Descripción completa

Detalles Bibliográficos
Autores principales: Dou, Yifeng, Meng, Wentao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: IOS Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10200164/
https://www.ncbi.nlm.nih.gov/pubmed/37066939
http://dx.doi.org/10.3233/THC-236034
_version_ 1785045081715113984
author Dou, Yifeng
Meng, Wentao
author_facet Dou, Yifeng
Meng, Wentao
author_sort Dou, Yifeng
collection PubMed
description BACKGROUND: With the advent of 5G and the era of Big Data, the rapid development of medical information technology around the world, the massive application of electronic medical records and cases, and the digitization of medical equipment and instruments, a large amount of data has accumulated in the database system of hospitals, which includes clinical diagnosis data and hospital management data. OBJECTIVE: This study aimed to examine the classification effects of different machine learning algorithms on medical datasets so as to better explore the value of machine learning methods in aiding medical diagnosis. METHODS: The classification datasets of four different medical fields in the University of California Irvine machine learning database were used as the research object. Also, six categories of classification models based on the Bayesian theorem idea, integrated learning idea, and rule-based and tree-based idea were constructed using the Weka platform. RESULTS: The between-group experiments showed that the Random Forest algorithm achieved the best results on the Indian liver disease patient dataset (ILPD), delivery cardiotocography (CADG), and lymphatic tractography (LYMP) datasets, followed by Bagging and partition and regression tree. In the within-group algorithm comparison experiments, the Bagging algorithm achieved better results than other algorithms based on the integration idea for 11 metrics on all datasets, mainly focusing on 2 binary datasets. Logit Boost had only 7 metrics with significant performance, and the best algorithm was Rotation Forest, with 28 metrics achieving optimal values. Among the algorithms based on tree ideas, the logistic model tree algorithm achieved optimal results on all metrics on the mammographic dataset (MAGR). The classification performance of BFTree, J48, and Random Tree was poor on each dataset. The best algorithm was Random Forest on the ILPD, CADG, and LYMP datasets with 27 metrics reaching the optimum. CONCLUSION: Machine learning algorithms have good application value in disease prediction and can provide a reference basis for disease diagnosis.
format Online
Article
Text
id pubmed-10200164
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher IOS Press
record_format MEDLINE/PubMed
spelling pubmed-102001642023-05-22 Comparative analysis of weka-based classification algorithms on medical diagnosis datasets Dou, Yifeng Meng, Wentao Technol Health Care Research Article BACKGROUND: With the advent of 5G and the era of Big Data, the rapid development of medical information technology around the world, the massive application of electronic medical records and cases, and the digitization of medical equipment and instruments, a large amount of data has accumulated in the database system of hospitals, which includes clinical diagnosis data and hospital management data. OBJECTIVE: This study aimed to examine the classification effects of different machine learning algorithms on medical datasets so as to better explore the value of machine learning methods in aiding medical diagnosis. METHODS: The classification datasets of four different medical fields in the University of California Irvine machine learning database were used as the research object. Also, six categories of classification models based on the Bayesian theorem idea, integrated learning idea, and rule-based and tree-based idea were constructed using the Weka platform. RESULTS: The between-group experiments showed that the Random Forest algorithm achieved the best results on the Indian liver disease patient dataset (ILPD), delivery cardiotocography (CADG), and lymphatic tractography (LYMP) datasets, followed by Bagging and partition and regression tree. In the within-group algorithm comparison experiments, the Bagging algorithm achieved better results than other algorithms based on the integration idea for 11 metrics on all datasets, mainly focusing on 2 binary datasets. Logit Boost had only 7 metrics with significant performance, and the best algorithm was Rotation Forest, with 28 metrics achieving optimal values. Among the algorithms based on tree ideas, the logistic model tree algorithm achieved optimal results on all metrics on the mammographic dataset (MAGR). The classification performance of BFTree, J48, and Random Tree was poor on each dataset. The best algorithm was Random Forest on the ILPD, CADG, and LYMP datasets with 27 metrics reaching the optimum. CONCLUSION: Machine learning algorithms have good application value in disease prediction and can provide a reference basis for disease diagnosis. IOS Press 2023-04-28 /pmc/articles/PMC10200164/ /pubmed/37066939 http://dx.doi.org/10.3233/THC-236034 Text en © 2023 – The authors. Published by IOS Press. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) License (https://creativecommons.org/licenses/by-nc/4.0/) , which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dou, Yifeng
Meng, Wentao
Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
title Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
title_full Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
title_fullStr Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
title_full_unstemmed Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
title_short Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
title_sort comparative analysis of weka-based classification algorithms on medical diagnosis datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10200164/
https://www.ncbi.nlm.nih.gov/pubmed/37066939
http://dx.doi.org/10.3233/THC-236034
work_keys_str_mv AT douyifeng comparativeanalysisofwekabasedclassificationalgorithmsonmedicaldiagnosisdatasets
AT mengwentao comparativeanalysisofwekabasedclassificationalgorithmsonmedicaldiagnosisdatasets