Cargando…

Barcodes for genomes and applications

BACKGROUND: Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The collection of these k-mer frequency distributions is unique to each genome and terme...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Fengfeng, Olman, Victor, Xu, Ying
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621371/
https://www.ncbi.nlm.nih.gov/pubmed/19091119
http://dx.doi.org/10.1186/1471-2105-9-546
_version_ 1782163400687616000
author Zhou, Fengfeng
Olman, Victor
Xu, Ying
author_facet Zhou, Fengfeng
Olman, Victor
Xu, Ying
author_sort Zhou, Fengfeng
collection PubMed
description BACKGROUND: Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The collection of these k-mer frequency distributions is unique to each genome and termed the genome's barcode. RESULTS: We found that for each genome, the majority of its short sequence fragments have highly similar barcodes while sequence fragments with different barcodes typically correspond to genes that are horizontally transferred or highly expressed. This observation has led to new and more effective ways for addressing two challenging problems: metagenome binning problem and identification of horizontally transferred genes. Our barcode-based metagenome binning algorithm substantially improves the state of the art in terms of both binning accuracies and the scope of applicability. Other attractive properties of genomes barcodes include (a) the barcodes have different and identifiable characteristics for different classes of genomes like prokaryotes, eukaryotes, mitochondria and plastids, and (b) barcodes similarities are generally proportional to the genomes' phylogenetic closeness. CONCLUSION: These and other properties of genomes barcodes make them a new and effective tool for studying numerous genome and metagenome analysis problems.
format Text
id pubmed-2621371
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26213712009-01-13 Barcodes for genomes and applications Zhou, Fengfeng Olman, Victor Xu, Ying BMC Bioinformatics Research Article BACKGROUND: Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The collection of these k-mer frequency distributions is unique to each genome and termed the genome's barcode. RESULTS: We found that for each genome, the majority of its short sequence fragments have highly similar barcodes while sequence fragments with different barcodes typically correspond to genes that are horizontally transferred or highly expressed. This observation has led to new and more effective ways for addressing two challenging problems: metagenome binning problem and identification of horizontally transferred genes. Our barcode-based metagenome binning algorithm substantially improves the state of the art in terms of both binning accuracies and the scope of applicability. Other attractive properties of genomes barcodes include (a) the barcodes have different and identifiable characteristics for different classes of genomes like prokaryotes, eukaryotes, mitochondria and plastids, and (b) barcodes similarities are generally proportional to the genomes' phylogenetic closeness. CONCLUSION: These and other properties of genomes barcodes make them a new and effective tool for studying numerous genome and metagenome analysis problems. BioMed Central 2008-12-17 /pmc/articles/PMC2621371/ /pubmed/19091119 http://dx.doi.org/10.1186/1471-2105-9-546 Text en Copyright © 2008 Zhou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhou, Fengfeng
Olman, Victor
Xu, Ying
Barcodes for genomes and applications
title Barcodes for genomes and applications
title_full Barcodes for genomes and applications
title_fullStr Barcodes for genomes and applications
title_full_unstemmed Barcodes for genomes and applications
title_short Barcodes for genomes and applications
title_sort barcodes for genomes and applications
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621371/
https://www.ncbi.nlm.nih.gov/pubmed/19091119
http://dx.doi.org/10.1186/1471-2105-9-546
work_keys_str_mv AT zhoufengfeng barcodesforgenomesandapplications
AT olmanvictor barcodesforgenomesandapplications
AT xuying barcodesforgenomesandapplications