Cargando…

Kullback Leibler divergence in complete bacterial and phage genomes

The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhter, Sajia, Aziz, Ramy K., Kashef, Mona T., Ibrahim, Eslam S., Bailey, Barbara, Edwards, Robert A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5712468/
https://www.ncbi.nlm.nih.gov/pubmed/29204318
http://dx.doi.org/10.7717/peerj.4026
_version_ 1783283225411452928
author Akhter, Sajia
Aziz, Ramy K.
Kashef, Mona T.
Ibrahim, Eslam S.
Bailey, Barbara
Edwards, Robert A.
author_facet Akhter, Sajia
Aziz, Ramy K.
Kashef, Mona T.
Ibrahim, Eslam S.
Bailey, Barbara
Edwards, Robert A.
author_sort Akhter, Sajia
collection PubMed
description The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
format Online
Article
Text
id pubmed-5712468
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-57124682017-12-04 Kullback Leibler divergence in complete bacterial and phage genomes Akhter, Sajia Aziz, Ramy K. Kashef, Mona T. Ibrahim, Eslam S. Bailey, Barbara Edwards, Robert A. PeerJ Bioinformatics The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. PeerJ Inc. 2017-11-30 /pmc/articles/PMC5712468/ /pubmed/29204318 http://dx.doi.org/10.7717/peerj.4026 Text en ©2017 Akhter et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Akhter, Sajia
Aziz, Ramy K.
Kashef, Mona T.
Ibrahim, Eslam S.
Bailey, Barbara
Edwards, Robert A.
Kullback Leibler divergence in complete bacterial and phage genomes
title Kullback Leibler divergence in complete bacterial and phage genomes
title_full Kullback Leibler divergence in complete bacterial and phage genomes
title_fullStr Kullback Leibler divergence in complete bacterial and phage genomes
title_full_unstemmed Kullback Leibler divergence in complete bacterial and phage genomes
title_short Kullback Leibler divergence in complete bacterial and phage genomes
title_sort kullback leibler divergence in complete bacterial and phage genomes
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5712468/
https://www.ncbi.nlm.nih.gov/pubmed/29204318
http://dx.doi.org/10.7717/peerj.4026
work_keys_str_mv AT akhtersajia kullbackleiblerdivergenceincompletebacterialandphagegenomes
AT azizramyk kullbackleiblerdivergenceincompletebacterialandphagegenomes
AT kashefmonat kullbackleiblerdivergenceincompletebacterialandphagegenomes
AT ibrahimeslams kullbackleiblerdivergenceincompletebacterialandphagegenomes
AT baileybarbara kullbackleiblerdivergenceincompletebacterialandphagegenomes
AT edwardsroberta kullbackleiblerdivergenceincompletebacterialandphagegenomes