Cargando…

Nearest labelset using double distances for multi-label classification

Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this article we propose a novel approach, Nearest Labelset using D...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gweon, Hyukjun, Schonlau, Matthias, Steiner, Stefan H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2019
Materias:	Data Mining and Machine Learning
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924696/ https://www.ncbi.nlm.nih.gov/pubmed/33816895 http://dx.doi.org/10.7717/peerj-cs.242

_version_	1783659143595294720
author	Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H.
author_facet	Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H.
author_sort	Gweon, Hyukjun
collection	PubMed
description	Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this article we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of 0/1 loss, and multi-label accuracy and ranks second on the F-measure (after a method called ECC) and on Hamming loss (after a method called RF-PCT).
format	Online Article Text
id	pubmed-7924696
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-79246962021-04-02 Nearest labelset using double distances for multi-label classification Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. PeerJ Comput Sci Data Mining and Machine Learning Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this article we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of 0/1 loss, and multi-label accuracy and ranks second on the F-measure (after a method called ECC) and on Hamming loss (after a method called RF-PCT). PeerJ Inc. 2019-12-09 /pmc/articles/PMC7924696/ /pubmed/33816895 http://dx.doi.org/10.7717/peerj-cs.242 Text en ©2019 Gweon et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Data Mining and Machine Learning Gweon, Hyukjun Schonlau, Matthias Steiner, Stefan H. Nearest labelset using double distances for multi-label classification
title	Nearest labelset using double distances for multi-label classification
title_full	Nearest labelset using double distances for multi-label classification
title_fullStr	Nearest labelset using double distances for multi-label classification
title_full_unstemmed	Nearest labelset using double distances for multi-label classification
title_short	Nearest labelset using double distances for multi-label classification
title_sort	nearest labelset using double distances for multi-label classification
topic	Data Mining and Machine Learning
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924696/ https://www.ncbi.nlm.nih.gov/pubmed/33816895 http://dx.doi.org/10.7717/peerj-cs.242
work_keys_str_mv	AT gweonhyukjun nearestlabelsetusingdoubledistancesformultilabelclassification AT schonlaumatthias nearestlabelsetusingdoubledistancesformultilabelclassification AT steinerstefanh nearestlabelsetusingdoubledistancesformultilabelclassification

Nearest labelset using double distances for multi-label classification

Ejemplares similares