Cargando…

SynSys: A Synthetic Data Generation System for Healthcare Applications

Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine lea...

Descripción completa

Detalles Bibliográficos
Autores principales: Dahmen, Jessamyn, Cook, Diane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427177/
https://www.ncbi.nlm.nih.gov/pubmed/30857130
http://dx.doi.org/10.3390/s19051181
_version_ 1783405151830147072
author Dahmen, Jessamyn
Cook, Diane
author_facet Dahmen, Jessamyn
Cook, Diane
author_sort Dahmen, Jessamyn
collection PubMed
description Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.
format Online
Article
Text
id pubmed-6427177
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-64271772019-04-15 SynSys: A Synthetic Data Generation System for Healthcare Applications Dahmen, Jessamyn Cook, Diane Sensors (Basel) Article Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone. MDPI 2019-03-08 /pmc/articles/PMC6427177/ /pubmed/30857130 http://dx.doi.org/10.3390/s19051181 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dahmen, Jessamyn
Cook, Diane
SynSys: A Synthetic Data Generation System for Healthcare Applications
title SynSys: A Synthetic Data Generation System for Healthcare Applications
title_full SynSys: A Synthetic Data Generation System for Healthcare Applications
title_fullStr SynSys: A Synthetic Data Generation System for Healthcare Applications
title_full_unstemmed SynSys: A Synthetic Data Generation System for Healthcare Applications
title_short SynSys: A Synthetic Data Generation System for Healthcare Applications
title_sort synsys: a synthetic data generation system for healthcare applications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427177/
https://www.ncbi.nlm.nih.gov/pubmed/30857130
http://dx.doi.org/10.3390/s19051181
work_keys_str_mv AT dahmenjessamyn synsysasyntheticdatagenerationsystemforhealthcareapplications
AT cookdiane synsysasyntheticdatagenerationsystemforhealthcareapplications