Cargando…

Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets

Early diagnosis of some life-threatening diseases such as cancers and heart is crucial for effective treatments. Supervised machine learning has proved to be a very useful tool to serve this purpose. Historical data of patients including clinical and demographic information is used for training lear...

Descripción completa

Detalles Bibliográficos
Autores principales: Vuttipittayamongkol, Pattaramon, Elyan, Eyad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256568/
http://dx.doi.org/10.1007/978-3-030-49186-4_30
_version_ 1783539939025092608
author Vuttipittayamongkol, Pattaramon
Elyan, Eyad
author_facet Vuttipittayamongkol, Pattaramon
Elyan, Eyad
author_sort Vuttipittayamongkol, Pattaramon
collection PubMed
description Early diagnosis of some life-threatening diseases such as cancers and heart is crucial for effective treatments. Supervised machine learning has proved to be a very useful tool to serve this purpose. Historical data of patients including clinical and demographic information is used for training learning algorithms. This builds predictive models that provide initial diagnoses. However, in the medical domain, it is common to have the positive class under-represented in a dataset. In such a scenario, a typical learning algorithm tends to be biased towards the negative class, which is the majority class, and misclassify positive cases. This is known as the class imbalance problem. In this paper, a framework for predictive diagnostics of diseases with imbalanced records is presented. To reduce the classification bias, we propose the usage of an overlap-based undersampling method to improve the visibility of minority class samples in the region where the two classes overlap. This is achieved by detecting and removing negative class instances from the overlapping region. This will improve class separability in the data space. Experimental results show achievement of high accuracy in the positive class, which is highly preferable in the medical domain, while good trade-offs between sensitivity and specificity were obtained. Results also show that the method often outperformed other state-of-the-art and well-established techniques.
format Online
Article
Text
id pubmed-7256568
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72565682020-05-29 Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets Vuttipittayamongkol, Pattaramon Elyan, Eyad Artificial Intelligence Applications and Innovations Article Early diagnosis of some life-threatening diseases such as cancers and heart is crucial for effective treatments. Supervised machine learning has proved to be a very useful tool to serve this purpose. Historical data of patients including clinical and demographic information is used for training learning algorithms. This builds predictive models that provide initial diagnoses. However, in the medical domain, it is common to have the positive class under-represented in a dataset. In such a scenario, a typical learning algorithm tends to be biased towards the negative class, which is the majority class, and misclassify positive cases. This is known as the class imbalance problem. In this paper, a framework for predictive diagnostics of diseases with imbalanced records is presented. To reduce the classification bias, we propose the usage of an overlap-based undersampling method to improve the visibility of minority class samples in the region where the two classes overlap. This is achieved by detecting and removing negative class instances from the overlapping region. This will improve class separability in the data space. Experimental results show achievement of high accuracy in the positive class, which is highly preferable in the medical domain, while good trade-offs between sensitivity and specificity were obtained. Results also show that the method often outperformed other state-of-the-art and well-established techniques. 2020-05-06 /pmc/articles/PMC7256568/ http://dx.doi.org/10.1007/978-3-030-49186-4_30 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Vuttipittayamongkol, Pattaramon
Elyan, Eyad
Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets
title Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets
title_full Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets
title_fullStr Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets
title_full_unstemmed Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets
title_short Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets
title_sort overlap-based undersampling method for classification of imbalanced medical datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256568/
http://dx.doi.org/10.1007/978-3-030-49186-4_30
work_keys_str_mv AT vuttipittayamongkolpattaramon overlapbasedundersamplingmethodforclassificationofimbalancedmedicaldatasets
AT elyaneyad overlapbasedundersamplingmethodforclassificationofimbalancedmedicaldatasets