Cargando…
Kullback Leibler divergence in complete bacterial and phage genomes
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition fo...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5712468/ https://www.ncbi.nlm.nih.gov/pubmed/29204318 http://dx.doi.org/10.7717/peerj.4026 |
_version_ | 1783283225411452928 |
---|---|
author | Akhter, Sajia Aziz, Ramy K. Kashef, Mona T. Ibrahim, Eslam S. Bailey, Barbara Edwards, Robert A. |
author_facet | Akhter, Sajia Aziz, Ramy K. Kashef, Mona T. Ibrahim, Eslam S. Bailey, Barbara Edwards, Robert A. |
author_sort | Akhter, Sajia |
collection | PubMed |
description | The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. |
format | Online Article Text |
id | pubmed-5712468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-57124682017-12-04 Kullback Leibler divergence in complete bacterial and phage genomes Akhter, Sajia Aziz, Ramy K. Kashef, Mona T. Ibrahim, Eslam S. Bailey, Barbara Edwards, Robert A. PeerJ Bioinformatics The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. PeerJ Inc. 2017-11-30 /pmc/articles/PMC5712468/ /pubmed/29204318 http://dx.doi.org/10.7717/peerj.4026 Text en ©2017 Akhter et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Akhter, Sajia Aziz, Ramy K. Kashef, Mona T. Ibrahim, Eslam S. Bailey, Barbara Edwards, Robert A. Kullback Leibler divergence in complete bacterial and phage genomes |
title | Kullback Leibler divergence in complete bacterial and phage genomes |
title_full | Kullback Leibler divergence in complete bacterial and phage genomes |
title_fullStr | Kullback Leibler divergence in complete bacterial and phage genomes |
title_full_unstemmed | Kullback Leibler divergence in complete bacterial and phage genomes |
title_short | Kullback Leibler divergence in complete bacterial and phage genomes |
title_sort | kullback leibler divergence in complete bacterial and phage genomes |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5712468/ https://www.ncbi.nlm.nih.gov/pubmed/29204318 http://dx.doi.org/10.7717/peerj.4026 |
work_keys_str_mv | AT akhtersajia kullbackleiblerdivergenceincompletebacterialandphagegenomes AT azizramyk kullbackleiblerdivergenceincompletebacterialandphagegenomes AT kashefmonat kullbackleiblerdivergenceincompletebacterialandphagegenomes AT ibrahimeslams kullbackleiblerdivergenceincompletebacterialandphagegenomes AT baileybarbara kullbackleiblerdivergenceincompletebacterialandphagegenomes AT edwardsroberta kullbackleiblerdivergenceincompletebacterialandphagegenomes |