Cargando…

Comparison of metagenomic samples using sequence signatures

BACKGROUND: Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Bai, Song, Kai, Ren, Jie, Deng, Minghua, Sun, Fengzhu, Zhang, Xuegong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549735/
https://www.ncbi.nlm.nih.gov/pubmed/23268604
http://dx.doi.org/10.1186/1471-2164-13-730
_version_ 1782256457615409152
author Jiang, Bai
Song, Kai
Ren, Jie
Deng, Minghua
Sun, Fengzhu
Zhang, Xuegong
author_facet Jiang, Bai
Song, Kai
Ren, Jie
Deng, Minghua
Sun, Fengzhu
Zhang, Xuegong
author_sort Jiang, Bai
collection PubMed
description BACKGROUND: Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS) read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. RESULTS: We studied several dissimilarity measures, including d(2), d(2)(*) and d(2)(S) recently developed from our group, a measure (hereinafter noted as Hao) used in CVTree developed from Hao’s group (Qi et al., 2004), measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009), as well as standard l(p) measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS) metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d(2)(S) can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples are obtained through the analyses. Our results show that sequence signatures of the mammalian gut are closely associated with diet and gut physiology of the mammals, and that sequence signatures of marine communities are closely related to location and temperature. CONCLUSIONS: Sequence signatures can successfully reveal major group and gradient relationships among metagenomic samples from NGS reads without alignment to reference databases. The d(2)(S) dissimilarity measure is a good choice in all application scenarios. The optimal choice of tuple size depends on sequencing depth, but it is quite robust within a range of choices for moderate sequencing depths.
format Online
Article
Text
id pubmed-3549735
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35497352013-01-23 Comparison of metagenomic samples using sequence signatures Jiang, Bai Song, Kai Ren, Jie Deng, Minghua Sun, Fengzhu Zhang, Xuegong BMC Genomics Research Article BACKGROUND: Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS) read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. RESULTS: We studied several dissimilarity measures, including d(2), d(2)(*) and d(2)(S) recently developed from our group, a measure (hereinafter noted as Hao) used in CVTree developed from Hao’s group (Qi et al., 2004), measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009), as well as standard l(p) measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS) metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d(2)(S) can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples are obtained through the analyses. Our results show that sequence signatures of the mammalian gut are closely associated with diet and gut physiology of the mammals, and that sequence signatures of marine communities are closely related to location and temperature. CONCLUSIONS: Sequence signatures can successfully reveal major group and gradient relationships among metagenomic samples from NGS reads without alignment to reference databases. The d(2)(S) dissimilarity measure is a good choice in all application scenarios. The optimal choice of tuple size depends on sequencing depth, but it is quite robust within a range of choices for moderate sequencing depths. BioMed Central 2012-12-27 /pmc/articles/PMC3549735/ /pubmed/23268604 http://dx.doi.org/10.1186/1471-2164-13-730 Text en Copyright ©2012 Jiang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jiang, Bai
Song, Kai
Ren, Jie
Deng, Minghua
Sun, Fengzhu
Zhang, Xuegong
Comparison of metagenomic samples using sequence signatures
title Comparison of metagenomic samples using sequence signatures
title_full Comparison of metagenomic samples using sequence signatures
title_fullStr Comparison of metagenomic samples using sequence signatures
title_full_unstemmed Comparison of metagenomic samples using sequence signatures
title_short Comparison of metagenomic samples using sequence signatures
title_sort comparison of metagenomic samples using sequence signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549735/
https://www.ncbi.nlm.nih.gov/pubmed/23268604
http://dx.doi.org/10.1186/1471-2164-13-730
work_keys_str_mv AT jiangbai comparisonofmetagenomicsamplesusingsequencesignatures
AT songkai comparisonofmetagenomicsamplesusingsequencesignatures
AT renjie comparisonofmetagenomicsamplesusingsequencesignatures
AT dengminghua comparisonofmetagenomicsamplesusingsequencesignatures
AT sunfengzhu comparisonofmetagenomicsamplesusingsequencesignatures
AT zhangxuegong comparisonofmetagenomicsamplesusingsequencesignatures