Cargando…

Genome Sizes and the Benford Distribution

BACKGROUND: Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for...

Descripción completa

Detalles Bibliográficos
Autores principales: Friar, James L., Goldman, Terrance, Pérez–Mercader, Juan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3356352/
https://www.ncbi.nlm.nih.gov/pubmed/22629319
http://dx.doi.org/10.1371/journal.pone.0036624
_version_ 1782233549282213888
author Friar, James L.
Goldman, Terrance
Pérez–Mercader, Juan
author_facet Friar, James L.
Goldman, Terrance
Pérez–Mercader, Juan
author_sort Friar, James L.
collection PubMed
description BACKGROUND: Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter. RESULTS: Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf’s), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude. CONCLUSIONS: In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8–12 megabasepairs.These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission.
format Online
Article
Text
id pubmed-3356352
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-33563522012-05-24 Genome Sizes and the Benford Distribution Friar, James L. Goldman, Terrance Pérez–Mercader, Juan PLoS One Research Article BACKGROUND: Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter. RESULTS: Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf’s), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude. CONCLUSIONS: In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8–12 megabasepairs.These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission. Public Library of Science 2012-05-18 /pmc/articles/PMC3356352/ /pubmed/22629319 http://dx.doi.org/10.1371/journal.pone.0036624 Text en Friar et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Friar, James L.
Goldman, Terrance
Pérez–Mercader, Juan
Genome Sizes and the Benford Distribution
title Genome Sizes and the Benford Distribution
title_full Genome Sizes and the Benford Distribution
title_fullStr Genome Sizes and the Benford Distribution
title_full_unstemmed Genome Sizes and the Benford Distribution
title_short Genome Sizes and the Benford Distribution
title_sort genome sizes and the benford distribution
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3356352/
https://www.ncbi.nlm.nih.gov/pubmed/22629319
http://dx.doi.org/10.1371/journal.pone.0036624
work_keys_str_mv AT friarjamesl genomesizesandthebenforddistribution
AT goldmanterrance genomesizesandthebenforddistribution
AT perezmercaderjuan genomesizesandthebenforddistribution