Cargando…
A hierarchical loss and its problems when classifying non-hierarchically
Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or vis...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344/ https://www.ncbi.nlm.nih.gov/pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222 |
_version_ | 1783481316871766016 |
---|---|
author | Wu, Cinna Tygert, Mark LeCun, Yann |
author_facet | Wu, Cinna Tygert, Mark LeCun, Yann |
author_sort | Wu, Cinna |
collection | PubMed |
description | Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier’s classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier. |
format | Online Article Text |
id | pubmed-6922344 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-69223442020-01-07 A hierarchical loss and its problems when classifying non-hierarchically Wu, Cinna Tygert, Mark LeCun, Yann PLoS One Research Article Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called “loss” or “win”) used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier’s classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier. Public Library of Science 2019-12-19 /pmc/articles/PMC6922344/ /pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222 Text en © 2019 Wu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wu, Cinna Tygert, Mark LeCun, Yann A hierarchical loss and its problems when classifying non-hierarchically |
title | A hierarchical loss and its problems when classifying non-hierarchically |
title_full | A hierarchical loss and its problems when classifying non-hierarchically |
title_fullStr | A hierarchical loss and its problems when classifying non-hierarchically |
title_full_unstemmed | A hierarchical loss and its problems when classifying non-hierarchically |
title_short | A hierarchical loss and its problems when classifying non-hierarchically |
title_sort | hierarchical loss and its problems when classifying non-hierarchically |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344/ https://www.ncbi.nlm.nih.gov/pubmed/31856228 http://dx.doi.org/10.1371/journal.pone.0226222 |
work_keys_str_mv | AT wucinna ahierarchicallossanditsproblemswhenclassifyingnonhierarchically AT tygertmark ahierarchicallossanditsproblemswhenclassifyingnonhierarchically AT lecunyann ahierarchicallossanditsproblemswhenclassifyingnonhierarchically AT wucinna hierarchicallossanditsproblemswhenclassifyingnonhierarchically AT tygertmark hierarchicallossanditsproblemswhenclassifyingnonhierarchically AT lecunyann hierarchicallossanditsproblemswhenclassifyingnonhierarchically |