Cargando…
Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric
Real-world datasets are often contaminated with label noise; labeling is not a clear-cut process and reliable methods tend to be expensive or time-consuming. Depending on the learning technique used, such label noise is potentially harmful, requiring an increased size of the training set, making the...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727820/ https://www.ncbi.nlm.nih.gov/pubmed/33255363 http://dx.doi.org/10.3390/s20236718 |
_version_ | 1783621137214734336 |
---|---|
author | Feng, Wei Quan, Yinghui Dauphin, Gabriel |
author_facet | Feng, Wei Quan, Yinghui Dauphin, Gabriel |
author_sort | Feng, Wei |
collection | PubMed |
description | Real-world datasets are often contaminated with label noise; labeling is not a clear-cut process and reliable methods tend to be expensive or time-consuming. Depending on the learning technique used, such label noise is potentially harmful, requiring an increased size of the training set, making the trained model more complex and more prone to overfitting and yielding less accurate prediction. This work proposes a cleaning technique called the ensemble method based on the noise detection metric (ENDM). From the corrupted training set, an ensemble classifier is first learned and used to derive four metrics assessing the likelihood for a sample to be mislabeled. For each metric, three thresholds are set to maximize the classifying performance on a corrupted validation dataset when using three different ensemble classifiers, namely Bagging, AdaBoost and k-nearest neighbor (k-NN). These thresholds are used to identify and then either remove or correct the corrupted samples. The effectiveness of the ENDM is demonstrated in performing the classification of 15 public datasets. A comparative analysis is conducted concerning the homogeneous-ensembles-based majority vote method and consensus vote method, two popular ensemble-based label noise filters. |
format | Online Article Text |
id | pubmed-7727820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-77278202020-12-11 Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric Feng, Wei Quan, Yinghui Dauphin, Gabriel Sensors (Basel) Article Real-world datasets are often contaminated with label noise; labeling is not a clear-cut process and reliable methods tend to be expensive or time-consuming. Depending on the learning technique used, such label noise is potentially harmful, requiring an increased size of the training set, making the trained model more complex and more prone to overfitting and yielding less accurate prediction. This work proposes a cleaning technique called the ensemble method based on the noise detection metric (ENDM). From the corrupted training set, an ensemble classifier is first learned and used to derive four metrics assessing the likelihood for a sample to be mislabeled. For each metric, three thresholds are set to maximize the classifying performance on a corrupted validation dataset when using three different ensemble classifiers, namely Bagging, AdaBoost and k-nearest neighbor (k-NN). These thresholds are used to identify and then either remove or correct the corrupted samples. The effectiveness of the ENDM is demonstrated in performing the classification of 15 public datasets. A comparative analysis is conducted concerning the homogeneous-ensembles-based majority vote method and consensus vote method, two popular ensemble-based label noise filters. MDPI 2020-11-24 /pmc/articles/PMC7727820/ /pubmed/33255363 http://dx.doi.org/10.3390/s20236718 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Feng, Wei Quan, Yinghui Dauphin, Gabriel Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric |
title | Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric |
title_full | Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric |
title_fullStr | Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric |
title_full_unstemmed | Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric |
title_short | Label Noise Cleaning with an Adaptive Ensemble Method Based on Noise Detection Metric |
title_sort | label noise cleaning with an adaptive ensemble method based on noise detection metric |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727820/ https://www.ncbi.nlm.nih.gov/pubmed/33255363 http://dx.doi.org/10.3390/s20236718 |
work_keys_str_mv | AT fengwei labelnoisecleaningwithanadaptiveensemblemethodbasedonnoisedetectionmetric AT quanyinghui labelnoisecleaningwithanadaptiveensemblemethodbasedonnoisedetectionmetric AT dauphingabriel labelnoisecleaningwithanadaptiveensemblemethodbasedonnoisedetectionmetric |