Cargando…

Applying Shannon's information theory to bacterial and phage genomes and metagenomes

All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhter, Sajia, Bailey, Barbara A., Salamon, Peter, Aziz, Ramy K., Edwards, Robert A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3539204/
https://www.ncbi.nlm.nih.gov/pubmed/23301154
http://dx.doi.org/10.1038/srep01033
_version_ 1782255061769912320
author Akhter, Sajia
Bailey, Barbara A.
Salamon, Peter
Aziz, Ramy K.
Edwards, Robert A.
author_facet Akhter, Sajia
Bailey, Barbara A.
Salamon, Peter
Aziz, Ramy K.
Edwards, Robert A.
author_sort Akhter, Sajia
collection PubMed
description All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.
format Online
Article
Text
id pubmed-3539204
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-35392042013-01-08 Applying Shannon's information theory to bacterial and phage genomes and metagenomes Akhter, Sajia Bailey, Barbara A. Salamon, Peter Aziz, Ramy K. Edwards, Robert A. Sci Rep Article All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis. Nature Publishing Group 2013-01-08 /pmc/articles/PMC3539204/ /pubmed/23301154 http://dx.doi.org/10.1038/srep01033 Text en Copyright © 2013, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-sa/3.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-ShareALike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/
spellingShingle Article
Akhter, Sajia
Bailey, Barbara A.
Salamon, Peter
Aziz, Ramy K.
Edwards, Robert A.
Applying Shannon's information theory to bacterial and phage genomes and metagenomes
title Applying Shannon's information theory to bacterial and phage genomes and metagenomes
title_full Applying Shannon's information theory to bacterial and phage genomes and metagenomes
title_fullStr Applying Shannon's information theory to bacterial and phage genomes and metagenomes
title_full_unstemmed Applying Shannon's information theory to bacterial and phage genomes and metagenomes
title_short Applying Shannon's information theory to bacterial and phage genomes and metagenomes
title_sort applying shannon's information theory to bacterial and phage genomes and metagenomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3539204/
https://www.ncbi.nlm.nih.gov/pubmed/23301154
http://dx.doi.org/10.1038/srep01033
work_keys_str_mv AT akhtersajia applyingshannonsinformationtheorytobacterialandphagegenomesandmetagenomes
AT baileybarbaraa applyingshannonsinformationtheorytobacterialandphagegenomesandmetagenomes
AT salamonpeter applyingshannonsinformationtheorytobacterialandphagegenomesandmetagenomes
AT azizramyk applyingshannonsinformationtheorytobacterialandphagegenomesandmetagenomes
AT edwardsroberta applyingshannonsinformationtheorytobacterialandphagegenomesandmetagenomes