Cargando…

Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies

BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De no...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Ying, Liu, Lin, Chen, Lina, Chen, Ting, Sun, Fengzhu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879298/ https://www.ncbi.nlm.nih.gov/pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348

_version_	1782297956223811584
author	Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu
author_facet	Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu
author_sort	Wang, Ying
collection	PubMed
description	BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best. RESULTS: We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three [Image: see text] dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical [Image: see text] distances. Results showed that the measure [Image: see text] can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/. CONCLUSIONS: The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The [Image: see text] dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model.
format	Online Article Text
id	pubmed-3879298
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-38792982014-01-03 Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu PLoS One Research Article BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best. RESULTS: We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three [Image: see text] dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical [Image: see text] distances. Results showed that the measure [Image: see text] can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/. CONCLUSIONS: The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The [Image: see text] dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model. Public Library of Science 2014-01-02 /pmc/articles/PMC3879298/ /pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348 Text en © 2014 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title	Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_full	Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_fullStr	Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_full_unstemmed	Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_short	Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_sort	comparison of metatranscriptomic samples based on k-tuple frequencies
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879298/ https://www.ncbi.nlm.nih.gov/pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348
work_keys_str_mv	AT wangying comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT liulin comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT chenlina comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT chenting comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT sunfengzhu comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies

Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies

Ejemplares similares