Cargando…

Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation

BACKGROUND: Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of...

Descripción completa

Detalles Bibliográficos
Autores principales: Toseland, Andrew, Moxon, Simon, Mock, Thomas, Moulton, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4209020/
https://www.ncbi.nlm.nih.gov/pubmed/25318651
http://dx.doi.org/10.1186/1471-2164-15-901
_version_ 1782341205658435584
author Toseland, Andrew
Moxon, Simon
Mock, Thomas
Moulton, Vincent
author_facet Toseland, Andrew
Moxon, Simon
Mock, Thomas
Moulton, Vincent
author_sort Toseland, Andrew
collection PubMed
description BACKGROUND: Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of closely related organisms can lead to fragmented assemblies containing chimeric contigs that may reduce the accuracy of downstream analyses and some advocate the use of alternate data reduction techniques. However, it is unclear how such data reduction techniques impact the annotation of metatranscriptome data and thus affect the interpretation of the results. RESULTS: To investigate the effect of such techniques on the annotation of metatranscriptome data we assess two commonly employed methods: clustering and de-novo assembly. To do this, we also developed an approach to simulate 454 and Illumina metatranscriptome data sets with varying degrees of taxonomic diversity. For the Illumina simulations, we found that a two-step approach of assembly followed by clustering of contigs and unassembled sequences produced the most accurate reflection of the real protein domain content of the sample. For the 454 simulations, the combined annotation of contigs and unassembled reads produced the most accurate protein domain annotations. CONCLUSIONS: Based on these data we recommend that assembly be attempted, and that unassembled reads be included in the final annotation for metatranscriptome data, even from highly diverse environments as the resulting annotations should lead to a more accurate reflection of the transcriptional behaviour of the microbial population under investigation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-901) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4209020
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42090202014-10-28 Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation Toseland, Andrew Moxon, Simon Mock, Thomas Moulton, Vincent BMC Genomics Research Article BACKGROUND: Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of closely related organisms can lead to fragmented assemblies containing chimeric contigs that may reduce the accuracy of downstream analyses and some advocate the use of alternate data reduction techniques. However, it is unclear how such data reduction techniques impact the annotation of metatranscriptome data and thus affect the interpretation of the results. RESULTS: To investigate the effect of such techniques on the annotation of metatranscriptome data we assess two commonly employed methods: clustering and de-novo assembly. To do this, we also developed an approach to simulate 454 and Illumina metatranscriptome data sets with varying degrees of taxonomic diversity. For the Illumina simulations, we found that a two-step approach of assembly followed by clustering of contigs and unassembled sequences produced the most accurate reflection of the real protein domain content of the sample. For the 454 simulations, the combined annotation of contigs and unassembled reads produced the most accurate protein domain annotations. CONCLUSIONS: Based on these data we recommend that assembly be attempted, and that unassembled reads be included in the final annotation for metatranscriptome data, even from highly diverse environments as the resulting annotations should lead to a more accurate reflection of the transcriptional behaviour of the microbial population under investigation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-901) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-15 /pmc/articles/PMC4209020/ /pubmed/25318651 http://dx.doi.org/10.1186/1471-2164-15-901 Text en © Toseland et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Toseland, Andrew
Moxon, Simon
Mock, Thomas
Moulton, Vincent
Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
title Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
title_full Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
title_fullStr Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
title_full_unstemmed Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
title_short Metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
title_sort metatranscriptomes from diverse microbial communities: assessment of data reduction techniques for rigorous annotation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4209020/
https://www.ncbi.nlm.nih.gov/pubmed/25318651
http://dx.doi.org/10.1186/1471-2164-15-901
work_keys_str_mv AT toselandandrew metatranscriptomesfromdiversemicrobialcommunitiesassessmentofdatareductiontechniquesforrigorousannotation
AT moxonsimon metatranscriptomesfromdiversemicrobialcommunitiesassessmentofdatareductiontechniquesforrigorousannotation
AT mockthomas metatranscriptomesfromdiversemicrobialcommunitiesassessmentofdatareductiontechniquesforrigorousannotation
AT moultonvincent metatranscriptomesfromdiversemicrobialcommunitiesassessmentofdatareductiontechniquesforrigorousannotation