Cargando…

A Random Categorization Model for Hierarchical Taxonomies

A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important q...

Descripción completa

Detalles Bibliográficos
Autores principales: D'Amico, Guido, Rabadan, Raul, Kleban, Matthew
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1038/s41598-017-17168-6
http://cds.cern.ch/record/2319801
_version_ 1780958457219776512
author D'Amico, Guido
Rabadan, Raul
Kleban, Matthew
author_facet D'Amico, Guido
Rabadan, Raul
Kleban, Matthew
author_sort D'Amico, Guido
collection CERN
description A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets – including items lost and found on the New York City transit system, library books, and a bacterial microbiome – and discover such an underlying commonality. A simple, non-parametric branching model that randomly categorizes items and takes as input only the total number of items and the total number of categories is quite successful in reproducing the observed abundance distributions. This result may shed light on patterns in species-abundance distributions long observed in ecology. The model also predicts the number of taxonomic categories that remain unrepresented in a finite sample.
id oai-inspirehep.net-1673822
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling oai-inspirehep.net-16738222021-05-03T08:11:58Zdoi:10.1038/s41598-017-17168-6http://cds.cern.ch/record/2319801engD'Amico, GuidoRabadan, RaulKleban, MatthewA Random Categorization Model for Hierarchical TaxonomiesOtherA taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets – including items lost and found on the New York City transit system, library books, and a bacterial microbiome – and discover such an underlying commonality. A simple, non-parametric branching model that randomly categorizes items and takes as input only the total number of items and the total number of categories is quite successful in reproducing the observed abundance distributions. This result may shed light on patterns in species-abundance distributions long observed in ecology. The model also predicts the number of taxonomic categories that remain unrepresented in a finite sample.oai:inspirehep.net:16738222017
spellingShingle Other
D'Amico, Guido
Rabadan, Raul
Kleban, Matthew
A Random Categorization Model for Hierarchical Taxonomies
title A Random Categorization Model for Hierarchical Taxonomies
title_full A Random Categorization Model for Hierarchical Taxonomies
title_fullStr A Random Categorization Model for Hierarchical Taxonomies
title_full_unstemmed A Random Categorization Model for Hierarchical Taxonomies
title_short A Random Categorization Model for Hierarchical Taxonomies
title_sort random categorization model for hierarchical taxonomies
topic Other
url https://dx.doi.org/10.1038/s41598-017-17168-6
http://cds.cern.ch/record/2319801
work_keys_str_mv AT damicoguido arandomcategorizationmodelforhierarchicaltaxonomies
AT rabadanraul arandomcategorizationmodelforhierarchicaltaxonomies
AT klebanmatthew arandomcategorizationmodelforhierarchicaltaxonomies
AT damicoguido randomcategorizationmodelforhierarchicaltaxonomies
AT rabadanraul randomcategorizationmodelforhierarchicaltaxonomies
AT klebanmatthew randomcategorizationmodelforhierarchicaltaxonomies