Cargando…

An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging

Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jing, Zhang, Haowen, Dong, Yabo, Zuo, Tongbin, Xu, Duanqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8587877/
https://www.ncbi.nlm.nih.gov/pubmed/34770721
http://dx.doi.org/10.3390/s21217414
_version_ 1784598281460908032
author Li, Jing
Zhang, Haowen
Dong, Yabo
Zuo, Tongbin
Xu, Duanqing
author_facet Li, Jing
Zhang, Haowen
Dong, Yabo
Zuo, Tongbin
Xu, Duanqing
author_sort Li, Jing
collection PubMed
description Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (PUTSC), which refers to automatically labelling the large unlabeled set U based on a small positive labeled set PL. The self-training (ST) is the most widely used method for solving the PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing ST methods simply employ the one-nearest-neighbor (1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the 1NN formula might not be optimal for PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called ST-average. Unlike conventional ST-based approaches, ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing ST-based methods. Besides, we demonstrate that ST-average can naturally be implemented along with many existing techniques used in original ST. Experimental results on public datasets show that ST-average performs better than related popular methods.
format Online
Article
Text
id pubmed-8587877
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85878772021-11-13 An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging Li, Jing Zhang, Haowen Dong, Yabo Zuo, Tongbin Xu, Duanqing Sensors (Basel) Article Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (PUTSC), which refers to automatically labelling the large unlabeled set U based on a small positive labeled set PL. The self-training (ST) is the most widely used method for solving the PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing ST methods simply employ the one-nearest-neighbor (1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the 1NN formula might not be optimal for PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called ST-average. Unlike conventional ST-based approaches, ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing ST-based methods. Besides, we demonstrate that ST-average can naturally be implemented along with many existing techniques used in original ST. Experimental results on public datasets show that ST-average performs better than related popular methods. MDPI 2021-11-08 /pmc/articles/PMC8587877/ /pubmed/34770721 http://dx.doi.org/10.3390/s21217414 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Jing
Zhang, Haowen
Dong, Yabo
Zuo, Tongbin
Xu, Duanqing
An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
title An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
title_full An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
title_fullStr An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
title_full_unstemmed An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
title_short An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
title_sort improved self-training method for positive unlabeled time series classification using dtw barycenter averaging
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8587877/
https://www.ncbi.nlm.nih.gov/pubmed/34770721
http://dx.doi.org/10.3390/s21217414
work_keys_str_mv AT lijing animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT zhanghaowen animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT dongyabo animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT zuotongbin animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT xuduanqing animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT lijing improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT zhanghaowen improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT dongyabo improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT zuotongbin improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging
AT xuduanqing improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging