Cargando…

Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data

In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient si...

Descripción completa

Detalles Bibliográficos
Autores principales: Douzas, Georgios, Lechleitner, Maria, Bacao, Fernando
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8989239/
https://www.ncbi.nlm.nih.gov/pubmed/35390030
http://dx.doi.org/10.1371/journal.pone.0265626
_version_ 1784683125578661888
author Douzas, Georgios
Lechleitner, Maria
Bacao, Fernando
author_facet Douzas, Georgios
Lechleitner, Maria
Bacao, Fernando
author_sort Douzas, Georgios
collection PubMed
description In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques.
format Online
Article
Text
id pubmed-8989239
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-89892392022-04-08 Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data Douzas, Georgios Lechleitner, Maria Bacao, Fernando PLoS One Research Article In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques. Public Library of Science 2022-04-07 /pmc/articles/PMC8989239/ /pubmed/35390030 http://dx.doi.org/10.1371/journal.pone.0265626 Text en © 2022 Douzas et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Douzas, Georgios
Lechleitner, Maria
Bacao, Fernando
Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data
title Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data
title_full Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data
title_fullStr Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data
title_full_unstemmed Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data
title_short Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data
title_sort improving the quality of predictive models in small data gsdot: a new algorithm for generating synthetic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8989239/
https://www.ncbi.nlm.nih.gov/pubmed/35390030
http://dx.doi.org/10.1371/journal.pone.0265626
work_keys_str_mv AT douzasgeorgios improvingthequalityofpredictivemodelsinsmalldatagsdotanewalgorithmforgeneratingsyntheticdata
AT lechleitnermaria improvingthequalityofpredictivemodelsinsmalldatagsdotanewalgorithmforgeneratingsyntheticdata
AT bacaofernando improvingthequalityofpredictivemodelsinsmalldatagsdotanewalgorithmforgeneratingsyntheticdata