Cargando…

Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19

Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various...

Descripción completa

Detalles Bibliográficos
Autores principales: Turlapati, Venkata Pavan Kumar, Prusty, Manas Ranjan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Authors. Published by Elsevier B.V. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710484/
https://www.ncbi.nlm.nih.gov/pubmed/33289013
http://dx.doi.org/10.1016/j.ibmed.2020.100023
_version_ 1783617957287428096
author Turlapati, Venkata Pavan Kumar
Prusty, Manas Ranjan
author_facet Turlapati, Venkata Pavan Kumar
Prusty, Manas Ranjan
author_sort Turlapati, Venkata Pavan Kumar
collection PubMed
description Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various over-sampling or under-sampling algorithms. But, the method which stood out of all the numerous algorithms was the Synthetic Minority Oversampling Technique (SMOTE). SMOTE generates synthetic samples of the minority class by oversampling each data-point by considering linear combinations of existing minority class neighbors. Each minority data sample generates an equal number of synthetic data. As the world is suffering from the plight of COVID-19 pandemic, the authors applied the idea to help boost the classifying performance whilst detecting this deadly virus. This paper presents a modified version of SMOTE known as Outlier-SMOTE wherein each data-point is oversampled with respect to its distance from other data-points. The data-point which is farther than the other data-points is given greater importance and is oversampled more than its counterparts. Outlier-SMOTE reduces the chances of overlapping of minority data samples which often occurs in the traditional SMOTE algorithm. This method is tested on five benchmark datasets and is eventually tested on a COVID-19 dataset. F-measure, Recall and Precision are used as principle metrics to evaluate the performance of the classifier as is the case for any class imbalanced data set. The proposed algorithm performs considerably better than the traditional SMOTE algorithm for the considered datasets.
format Online
Article
Text
id pubmed-7710484
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Authors. Published by Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-77104842020-12-03 Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 Turlapati, Venkata Pavan Kumar Prusty, Manas Ranjan Intell Based Med Article Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various over-sampling or under-sampling algorithms. But, the method which stood out of all the numerous algorithms was the Synthetic Minority Oversampling Technique (SMOTE). SMOTE generates synthetic samples of the minority class by oversampling each data-point by considering linear combinations of existing minority class neighbors. Each minority data sample generates an equal number of synthetic data. As the world is suffering from the plight of COVID-19 pandemic, the authors applied the idea to help boost the classifying performance whilst detecting this deadly virus. This paper presents a modified version of SMOTE known as Outlier-SMOTE wherein each data-point is oversampled with respect to its distance from other data-points. The data-point which is farther than the other data-points is given greater importance and is oversampled more than its counterparts. Outlier-SMOTE reduces the chances of overlapping of minority data samples which often occurs in the traditional SMOTE algorithm. This method is tested on five benchmark datasets and is eventually tested on a COVID-19 dataset. F-measure, Recall and Precision are used as principle metrics to evaluate the performance of the classifier as is the case for any class imbalanced data set. The proposed algorithm performs considerably better than the traditional SMOTE algorithm for the considered datasets. The Authors. Published by Elsevier B.V. 2020-12 2020-12-03 /pmc/articles/PMC7710484/ /pubmed/33289013 http://dx.doi.org/10.1016/j.ibmed.2020.100023 Text en © 2020 The Authors Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Turlapati, Venkata Pavan Kumar
Prusty, Manas Ranjan
Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
title Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
title_full Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
title_fullStr Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
title_full_unstemmed Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
title_short Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
title_sort outlier-smote: a refined oversampling technique for improved detection of covid-19
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710484/
https://www.ncbi.nlm.nih.gov/pubmed/33289013
http://dx.doi.org/10.1016/j.ibmed.2020.100023
work_keys_str_mv AT turlapativenkatapavankumar outliersmotearefinedoversamplingtechniqueforimproveddetectionofcovid19
AT prustymanasranjan outliersmotearefinedoversamplingtechniqueforimproveddetectionofcovid19