Cargando…
A Random Categorization Model for Hierarchical Taxonomies
A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important q...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719047/ https://www.ncbi.nlm.nih.gov/pubmed/29213056 http://dx.doi.org/10.1038/s41598-017-17168-6 |
_version_ | 1783284423408484352 |
---|---|
author | D’Amico, Guido Rabadan, Raul Kleban, Matthew |
author_facet | D’Amico, Guido Rabadan, Raul Kleban, Matthew |
author_sort | D’Amico, Guido |
collection | PubMed |
description | A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets – including items lost and found on the New York City transit system, library books, and a bacterial microbiome – and discover such an underlying commonality. A simple, non-parametric branching model that randomly categorizes items and takes as input only the total number of items and the total number of categories is quite successful in reproducing the observed abundance distributions. This result may shed light on patterns in species-abundance distributions long observed in ecology. The model also predicts the number of taxonomic categories that remain unrepresented in a finite sample. |
format | Online Article Text |
id | pubmed-5719047 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-57190472017-12-08 A Random Categorization Model for Hierarchical Taxonomies D’Amico, Guido Rabadan, Raul Kleban, Matthew Sci Rep Article A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets – including items lost and found on the New York City transit system, library books, and a bacterial microbiome – and discover such an underlying commonality. A simple, non-parametric branching model that randomly categorizes items and takes as input only the total number of items and the total number of categories is quite successful in reproducing the observed abundance distributions. This result may shed light on patterns in species-abundance distributions long observed in ecology. The model also predicts the number of taxonomic categories that remain unrepresented in a finite sample. Nature Publishing Group UK 2017-12-06 /pmc/articles/PMC5719047/ /pubmed/29213056 http://dx.doi.org/10.1038/s41598-017-17168-6 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article D’Amico, Guido Rabadan, Raul Kleban, Matthew A Random Categorization Model for Hierarchical Taxonomies |
title | A Random Categorization Model for Hierarchical Taxonomies |
title_full | A Random Categorization Model for Hierarchical Taxonomies |
title_fullStr | A Random Categorization Model for Hierarchical Taxonomies |
title_full_unstemmed | A Random Categorization Model for Hierarchical Taxonomies |
title_short | A Random Categorization Model for Hierarchical Taxonomies |
title_sort | random categorization model for hierarchical taxonomies |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719047/ https://www.ncbi.nlm.nih.gov/pubmed/29213056 http://dx.doi.org/10.1038/s41598-017-17168-6 |
work_keys_str_mv | AT damicoguido arandomcategorizationmodelforhierarchicaltaxonomies AT rabadanraul arandomcategorizationmodelforhierarchicaltaxonomies AT klebanmatthew arandomcategorizationmodelforhierarchicaltaxonomies AT damicoguido randomcategorizationmodelforhierarchicaltaxonomies AT rabadanraul randomcategorizationmodelforhierarchicaltaxonomies AT klebanmatthew randomcategorizationmodelforhierarchicaltaxonomies |