Cargando…

A hierarchical loss and its problems when classifying non-hierarchically

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or vis...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Cinna, Tygert, Mark, LeCun, Yann
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344/
https://www.ncbi.nlm.nih.gov/pubmed/31856228
http://dx.doi.org/10.1371/journal.pone.0226222
_version_ 1783481316871766016
author Wu, Cinna
Tygert, Mark
LeCun, Yann
author_facet Wu, Cinna
Tygert, Mark
LeCun, Yann
author_sort Wu, Cinna
collection PubMed
description Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier’s classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier.
format Online
Article
Text
id pubmed-6922344
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-69223442020-01-07 A hierarchical loss and its problems when classifying non-hierarchically Wu, Cinna Tygert, Mark LeCun, Yann PLoS One Research Article Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier’s classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier. Public Library of Science 2019-12-19 /pmc/articles/PMC6922344/ /pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222 Text en © 2019 Wu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wu, Cinna
Tygert, Mark
LeCun, Yann
A hierarchical loss and its problems when classifying non-hierarchically
title A hierarchical loss and its problems when classifying non-hierarchically
title_full A hierarchical loss and its problems when classifying non-hierarchically
title_fullStr A hierarchical loss and its problems when classifying non-hierarchically
title_full_unstemmed A hierarchical loss and its problems when classifying non-hierarchically
title_short A hierarchical loss and its problems when classifying non-hierarchically
title_sort hierarchical loss and its problems when classifying non-hierarchically
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344/
https://www.ncbi.nlm.nih.gov/pubmed/31856228
http://dx.doi.org/10.1371/journal.pone.0226222
work_keys_str_mv AT wucinna ahierarchicallossanditsproblemswhenclassifyingnonhierarchically
AT tygertmark ahierarchicallossanditsproblemswhenclassifyingnonhierarchically
AT lecunyann ahierarchicallossanditsproblemswhenclassifyingnonhierarchically
AT wucinna hierarchicallossanditsproblemswhenclassifyingnonhierarchically
AT tygertmark hierarchicallossanditsproblemswhenclassifyingnonhierarchically
AT lecunyann hierarchicallossanditsproblemswhenclassifyingnonhierarchically