Cargando…
Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19
Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Authors. Published by Elsevier B.V.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710484/ https://www.ncbi.nlm.nih.gov/pubmed/33289013 http://dx.doi.org/10.1016/j.ibmed.2020.100023 |
_version_ | 1783617957287428096 |
---|---|
author | Turlapati, Venkata Pavan Kumar Prusty, Manas Ranjan |
author_facet | Turlapati, Venkata Pavan Kumar Prusty, Manas Ranjan |
author_sort | Turlapati, Venkata Pavan Kumar |
collection | PubMed |
description | Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various over-sampling or under-sampling algorithms. But, the method which stood out of all the numerous algorithms was the Synthetic Minority Oversampling Technique (SMOTE). SMOTE generates synthetic samples of the minority class by oversampling each data-point by considering linear combinations of existing minority class neighbors. Each minority data sample generates an equal number of synthetic data. As the world is suffering from the plight of COVID-19 pandemic, the authors applied the idea to help boost the classifying performance whilst detecting this deadly virus. This paper presents a modified version of SMOTE known as Outlier-SMOTE wherein each data-point is oversampled with respect to its distance from other data-points. The data-point which is farther than the other data-points is given greater importance and is oversampled more than its counterparts. Outlier-SMOTE reduces the chances of overlapping of minority data samples which often occurs in the traditional SMOTE algorithm. This method is tested on five benchmark datasets and is eventually tested on a COVID-19 dataset. F-measure, Recall and Precision are used as principle metrics to evaluate the performance of the classifier as is the case for any class imbalanced data set. The proposed algorithm performs considerably better than the traditional SMOTE algorithm for the considered datasets. |
format | Online Article Text |
id | pubmed-7710484 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | The Authors. Published by Elsevier B.V. |
record_format | MEDLINE/PubMed |
spelling | pubmed-77104842020-12-03 Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 Turlapati, Venkata Pavan Kumar Prusty, Manas Ranjan Intell Based Med Article Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various over-sampling or under-sampling algorithms. But, the method which stood out of all the numerous algorithms was the Synthetic Minority Oversampling Technique (SMOTE). SMOTE generates synthetic samples of the minority class by oversampling each data-point by considering linear combinations of existing minority class neighbors. Each minority data sample generates an equal number of synthetic data. As the world is suffering from the plight of COVID-19 pandemic, the authors applied the idea to help boost the classifying performance whilst detecting this deadly virus. This paper presents a modified version of SMOTE known as Outlier-SMOTE wherein each data-point is oversampled with respect to its distance from other data-points. The data-point which is farther than the other data-points is given greater importance and is oversampled more than its counterparts. Outlier-SMOTE reduces the chances of overlapping of minority data samples which often occurs in the traditional SMOTE algorithm. This method is tested on five benchmark datasets and is eventually tested on a COVID-19 dataset. F-measure, Recall and Precision are used as principle metrics to evaluate the performance of the classifier as is the case for any class imbalanced data set. The proposed algorithm performs considerably better than the traditional SMOTE algorithm for the considered datasets. The Authors. Published by Elsevier B.V. 2020-12 2020-12-03 /pmc/articles/PMC7710484/ /pubmed/33289013 http://dx.doi.org/10.1016/j.ibmed.2020.100023 Text en © 2020 The Authors Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Turlapati, Venkata Pavan Kumar Prusty, Manas Ranjan Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 |
title | Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 |
title_full | Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 |
title_fullStr | Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 |
title_full_unstemmed | Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 |
title_short | Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19 |
title_sort | outlier-smote: a refined oversampling technique for improved detection of covid-19 |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710484/ https://www.ncbi.nlm.nih.gov/pubmed/33289013 http://dx.doi.org/10.1016/j.ibmed.2020.100023 |
work_keys_str_mv | AT turlapativenkatapavankumar outliersmotearefinedoversamplingtechniqueforimproveddetectionofcovid19 AT prustymanasranjan outliersmotearefinedoversamplingtechniqueforimproveddetectionofcovid19 |