Cargando…
An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positiv...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8587877/ https://www.ncbi.nlm.nih.gov/pubmed/34770721 http://dx.doi.org/10.3390/s21217414 |
_version_ | 1784598281460908032 |
---|---|
author | Li, Jing Zhang, Haowen Dong, Yabo Zuo, Tongbin Xu, Duanqing |
author_facet | Li, Jing Zhang, Haowen Dong, Yabo Zuo, Tongbin Xu, Duanqing |
author_sort | Li, Jing |
collection | PubMed |
description | Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (PUTSC), which refers to automatically labelling the large unlabeled set U based on a small positive labeled set PL. The self-training (ST) is the most widely used method for solving the PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing ST methods simply employ the one-nearest-neighbor (1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the 1NN formula might not be optimal for PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called ST-average. Unlike conventional ST-based approaches, ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing ST-based methods. Besides, we demonstrate that ST-average can naturally be implemented along with many existing techniques used in original ST. Experimental results on public datasets show that ST-average performs better than related popular methods. |
format | Online Article Text |
id | pubmed-8587877 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-85878772021-11-13 An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging Li, Jing Zhang, Haowen Dong, Yabo Zuo, Tongbin Xu, Duanqing Sensors (Basel) Article Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (PUTSC), which refers to automatically labelling the large unlabeled set U based on a small positive labeled set PL. The self-training (ST) is the most widely used method for solving the PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing ST methods simply employ the one-nearest-neighbor (1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the 1NN formula might not be optimal for PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called ST-average. Unlike conventional ST-based approaches, ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing ST-based methods. Besides, we demonstrate that ST-average can naturally be implemented along with many existing techniques used in original ST. Experimental results on public datasets show that ST-average performs better than related popular methods. MDPI 2021-11-08 /pmc/articles/PMC8587877/ /pubmed/34770721 http://dx.doi.org/10.3390/s21217414 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Li, Jing Zhang, Haowen Dong, Yabo Zuo, Tongbin Xu, Duanqing An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging |
title | An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging |
title_full | An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging |
title_fullStr | An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging |
title_full_unstemmed | An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging |
title_short | An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging |
title_sort | improved self-training method for positive unlabeled time series classification using dtw barycenter averaging |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8587877/ https://www.ncbi.nlm.nih.gov/pubmed/34770721 http://dx.doi.org/10.3390/s21217414 |
work_keys_str_mv | AT lijing animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT zhanghaowen animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT dongyabo animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT zuotongbin animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT xuduanqing animprovedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT lijing improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT zhanghaowen improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT dongyabo improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT zuotongbin improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging AT xuduanqing improvedselftrainingmethodforpositiveunlabeledtimeseriesclassificationusingdtwbarycenteraveraging |