Cargando…

Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

BACKGROUND: Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. RESULTS: We have analyzed aspects of the information content of Homo sapiens, Mus musculus,...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhandong, Venkatesh, Santosh S, Maley, Carlo C
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2628393/
https://www.ncbi.nlm.nih.gov/pubmed/18973670
http://dx.doi.org/10.1186/1471-2164-9-509
_version_ 1782163693658701824
author Liu, Zhandong
Venkatesh, Santosh S
Maley, Carlo C
author_facet Liu, Zhandong
Venkatesh, Santosh S
Maley, Carlo C
author_sort Liu, Zhandong
collection PubMed
description BACKGROUND: Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. RESULTS: We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. CONCLUSION: Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.
format Text
id pubmed-2628393
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26283932009-01-21 Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples Liu, Zhandong Venkatesh, Santosh S Maley, Carlo C BMC Genomics Research Article BACKGROUND: Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. RESULTS: We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. CONCLUSION: Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues. BioMed Central 2008-10-30 /pmc/articles/PMC2628393/ /pubmed/18973670 http://dx.doi.org/10.1186/1471-2164-9-509 Text en Copyright © 2008 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Zhandong
Venkatesh, Santosh S
Maley, Carlo C
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples
title Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples
title_full Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples
title_fullStr Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples
title_full_unstemmed Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples
title_short Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples
title_sort sequence space coverage, entropy of genomes and the potential to detect non-human dna in human samples
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2628393/
https://www.ncbi.nlm.nih.gov/pubmed/18973670
http://dx.doi.org/10.1186/1471-2164-9-509
work_keys_str_mv AT liuzhandong sequencespacecoverageentropyofgenomesandthepotentialtodetectnonhumandnainhumansamples
AT venkateshsantoshs sequencespacecoverageentropyofgenomesandthepotentialtodetectnonhumandnainhumansamples
AT maleycarloc sequencespacecoverageentropyofgenomesandthepotentialtodetectnonhumandnainhumansamples