Cargando…
Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De no...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879298/ https://www.ncbi.nlm.nih.gov/pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348 |
_version_ | 1782297956223811584 |
---|---|
author | Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu |
author_facet | Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu |
author_sort | Wang, Ying |
collection | PubMed |
description | BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best. RESULTS: We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three [Image: see text] dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical [Image: see text] distances. Results showed that the measure [Image: see text] can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/. CONCLUSIONS: The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The [Image: see text] dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model. |
format | Online Article Text |
id | pubmed-3879298 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38792982014-01-03 Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu PLoS One Research Article BACKGROUND: The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best. RESULTS: We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three [Image: see text] dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical [Image: see text] distances. Results showed that the measure [Image: see text] can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/. CONCLUSIONS: The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The [Image: see text] dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model. Public Library of Science 2014-01-02 /pmc/articles/PMC3879298/ /pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348 Text en © 2014 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Wang, Ying Liu, Lin Chen, Lina Chen, Ting Sun, Fengzhu Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies |
title | Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies |
title_full | Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies |
title_fullStr | Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies |
title_full_unstemmed | Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies |
title_short | Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies |
title_sort | comparison of metatranscriptomic samples based on k-tuple frequencies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879298/ https://www.ncbi.nlm.nih.gov/pubmed/24392128 http://dx.doi.org/10.1371/journal.pone.0084348 |
work_keys_str_mv | AT wangying comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT liulin comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT chenlina comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT chenting comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT sunfengzhu comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies |