Cargando…

Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies

BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De no...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ying, Liu, Lin, Chen, Lina, Chen, Ting, Sun, Fengzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879298/
https://www.ncbi.nlm.nih.gov/pubmed/24392128
http://dx.doi.org/10.1371/journal.pone.0084348
_version_ 1782297956223811584
author Wang, Ying
Liu, Lin
Chen, Lina
Chen, Ting
Sun, Fengzhu
author_facet Wang, Ying
Liu, Lin
Chen, Lina
Chen, Ting
Sun, Fengzhu
author_sort Wang, Ying
collection PubMed
description BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best. RESULTS: We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three [Image: see text] dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical [Image: see text] distances. Results showed that the measure [Image: see text] can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/. CONCLUSIONS: The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The [Image: see text] dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model.
format Online
Article
Text
id pubmed-3879298
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38792982014-01-03 Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu PLoS One Research Article BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best. RESULTS: We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three [Image: see text] dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical [Image: see text] distances. Results showed that the measure [Image: see text] can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/. CONCLUSIONS: The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The [Image: see text] dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model. Public Library of Science 2014-01-02 /pmc/articles/PMC3879298/ /pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348 Text en © 2014 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wang, Ying
Liu, Lin
Chen, Lina
Chen, Ting
Sun, Fengzhu
Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_full Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_fullStr Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_full_unstemmed Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_short Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
title_sort comparison of metatranscriptomic samples based on k-tuple frequencies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879298/
https://www.ncbi.nlm.nih.gov/pubmed/24392128
http://dx.doi.org/10.1371/journal.pone.0084348
work_keys_str_mv AT wangying comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies
AT liulin comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies
AT chenlina comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies
AT chenting comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies
AT sunfengzhu comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies