Cargando…

The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions

BACKGROUND: A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully seq...

Descripción completa

Detalles Bibliográficos
Autores principales: McCoy, Michael W., Allen, Andrew P., Gillooly, James F.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714469/
https://www.ncbi.nlm.nih.gov/pubmed/19649247
http://dx.doi.org/10.1371/journal.pone.0006456
_version_ 1782169674355572736
author McCoy, Michael W.
Allen, Andrew P.
Gillooly, James F.
author_facet McCoy, Michael W.
Allen, Andrew P.
Gillooly, James F.
author_sort McCoy, Michael W.
collection PubMed
description BACKGROUND: A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS: By fitting mixture models to data from whole genome sequences we show that the size-frequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60–80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining “non-random” ORFs are well-fitted by log-normal or gamma distributions, and similar to the size distributions of annotated proteins. CONCLUSIONS/SIGNIFICANCE: Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes.
format Text
id pubmed-2714469
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27144692009-08-01 The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions McCoy, Michael W. Allen, Andrew P. Gillooly, James F. PLoS One Research Article BACKGROUND: A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS: By fitting mixture models to data from whole genome sequences we show that the size-frequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60–80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining “non-random” ORFs are well-fitted by log-normal or gamma distributions, and similar to the size distributions of annotated proteins. CONCLUSIONS/SIGNIFICANCE: Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes. Public Library of Science 2009-07-30 /pmc/articles/PMC2714469/ /pubmed/19649247 http://dx.doi.org/10.1371/journal.pone.0006456 Text en McCoy et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
McCoy, Michael W.
Allen, Andrew P.
Gillooly, James F.
The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
title The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
title_full The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
title_fullStr The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
title_full_unstemmed The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
title_short The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
title_sort random nature of genome architecture: predicting open reading frame distributions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714469/
https://www.ncbi.nlm.nih.gov/pubmed/19649247
http://dx.doi.org/10.1371/journal.pone.0006456
work_keys_str_mv AT mccoymichaelw therandomnatureofgenomearchitecturepredictingopenreadingframedistributions
AT allenandrewp therandomnatureofgenomearchitecturepredictingopenreadingframedistributions
AT gilloolyjamesf therandomnatureofgenomearchitecturepredictingopenreadingframedistributions
AT mccoymichaelw randomnatureofgenomearchitecturepredictingopenreadingframedistributions
AT allenandrewp randomnatureofgenomearchitecturepredictingopenreadingframedistributions
AT gilloolyjamesf randomnatureofgenomearchitecturepredictingopenreadingframedistributions