Cargando…

Analysis of Learning Influence of Training Data Selected by Distribution Consistency

This study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distributio...

Descripción completa

Detalles Bibliográficos
Autores principales: Hwang, Myunggwon, Jeong, Yuna, Sung, Won-Kyung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7913647/
https://www.ncbi.nlm.nih.gov/pubmed/33557021
http://dx.doi.org/10.3390/s21041045
_version_ 1783656849437884416
author Hwang, Myunggwon
Jeong, Yuna
Sung, Won-Kyung
author_facet Hwang, Myunggwon
Jeong, Yuna
Sung, Won-Kyung
author_sort Hwang, Myunggwon
collection PubMed
description This study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distribution consistency (DC) of the target class data and examine how it affects the classifier. We use CIFAR-10 for the experiment and set various grid ratios from 0.5 to 0.005. The influences of these variables were analyzed with the use of different training data sizes selected based on high-DC, low-DC (inverse of high DC), and random (no criteria) selections. As a result, the average point accuracy at 0.95% (±0.65) and the point accuracy at 1.54% (±0.59) improved for the grid configurations of 0.008 and 0.005, respectively. These outcomes justify an improved performance compared with that of the existing approach (data distribution search). In this study, we confirmed that the learning performance improved when the training data were selected for very small grid and high-DC settings.
format Online
Article
Text
id pubmed-7913647
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79136472021-02-28 Analysis of Learning Influence of Training Data Selected by Distribution Consistency Hwang, Myunggwon Jeong, Yuna Sung, Won-Kyung Sensors (Basel) Article This study suggests a method to select core data that will be helpful for machine learning. Specifically, we form a two-dimensional distribution based on the similarity of the training data and compose grids with fixed ratios on the distribution. In each grid, we select data based on the distribution consistency (DC) of the target class data and examine how it affects the classifier. We use CIFAR-10 for the experiment and set various grid ratios from 0.5 to 0.005. The influences of these variables were analyzed with the use of different training data sizes selected based on high-DC, low-DC (inverse of high DC), and random (no criteria) selections. As a result, the average point accuracy at 0.95% (±0.65) and the point accuracy at 1.54% (±0.59) improved for the grid configurations of 0.008 and 0.005, respectively. These outcomes justify an improved performance compared with that of the existing approach (data distribution search). In this study, we confirmed that the learning performance improved when the training data were selected for very small grid and high-DC settings. MDPI 2021-02-04 /pmc/articles/PMC7913647/ /pubmed/33557021 http://dx.doi.org/10.3390/s21041045 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hwang, Myunggwon
Jeong, Yuna
Sung, Won-Kyung
Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_full Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_fullStr Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_full_unstemmed Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_short Analysis of Learning Influence of Training Data Selected by Distribution Consistency
title_sort analysis of learning influence of training data selected by distribution consistency
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7913647/
https://www.ncbi.nlm.nih.gov/pubmed/33557021
http://dx.doi.org/10.3390/s21041045
work_keys_str_mv AT hwangmyunggwon analysisoflearninginfluenceoftrainingdataselectedbydistributionconsistency
AT jeongyuna analysisoflearninginfluenceoftrainingdataselectedbydistributionconsistency
AT sungwonkyung analysisoflearninginfluenceoftrainingdataselectedbydistributionconsistency