Cargando…

A hierarchical loss and its problems when classifying non-hierarchically

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or vis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wu, Cinna, Tygert, Mark, LeCun, Yann
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344/ https://www.ncbi.nlm.nih.gov/pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222

_version_	1783481316871766016
author	Wu, Cinna Tygert, Mark LeCun, Yann
author_facet	Wu, Cinna Tygert, Mark LeCun, Yann
author_sort	Wu, Cinna
collection	PubMed
description	Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier’s classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier.
format	Online Article Text
id	pubmed-6922344
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-69223442020-01-07 A hierarchical loss and its problems when classifying non-hierarchically Wu, Cinna Tygert, Mark LeCun, Yann PLoS One Research Article Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier’s classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier. Public Library of Science 2019-12-19 /pmc/articles/PMC6922344/ /pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222 Text en © 2019 Wu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Wu, Cinna Tygert, Mark LeCun, Yann A hierarchical loss and its problems when classifying non-hierarchically
title	A hierarchical loss and its problems when classifying non-hierarchically
title_full	A hierarchical loss and its problems when classifying non-hierarchically
title_fullStr	A hierarchical loss and its problems when classifying non-hierarchically
title_full_unstemmed	A hierarchical loss and its problems when classifying non-hierarchically
title_short	A hierarchical loss and its problems when classifying non-hierarchically
title_sort	hierarchical loss and its problems when classifying non-hierarchically
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344/ https://www.ncbi.nlm.nih.gov/pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222
work_keys_str_mv	AT wucinna ahierarchicallossanditsproblemswhenclassifyingnonhierarchically AT tygertmark ahierarchicallossanditsproblemswhenclassifyingnonhierarchically AT lecunyann ahierarchicallossanditsproblemswhenclassifyingnonhierarchically AT wucinna hierarchicallossanditsproblemswhenclassifyingnonhierarchically AT tygertmark hierarchicallossanditsproblemswhenclassifyingnonhierarchically AT lecunyann hierarchicallossanditsproblemswhenclassifyingnonhierarchically

A hierarchical loss and its problems when classifying non-hierarchically

Ejemplares similares