Cargando…

SAMSA: a comprehensive metatranscriptome analysis pipeline

BACKGROUND: Although metatranscriptomics—the study of diverse microbial population activity based on RNA-seq data—is rapidly growing in popularity, there are limited options for biologists to analyze this type of data. Current approaches for processing metatranscriptomes rely on restricted databases...

Descripción completa

Detalles Bibliográficos
Autores principales: Westreich, Samuel T., Korf, Ian, Mills, David A., Lemay, Danielle G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041328/
https://www.ncbi.nlm.nih.gov/pubmed/27687690
http://dx.doi.org/10.1186/s12859-016-1270-8
_version_ 1782456388538073088
author Westreich, Samuel T.
Korf, Ian
Mills, David A.
Lemay, Danielle G.
author_facet Westreich, Samuel T.
Korf, Ian
Mills, David A.
Lemay, Danielle G.
author_sort Westreich, Samuel T.
collection PubMed
description BACKGROUND: Although metatranscriptomics—the study of diverse microbial population activity based on RNA-seq data—is rapidly growing in popularity, there are limited options for biologists to analyze this type of data. Current approaches for processing metatranscriptomes rely on restricted databases and a dedicated computing cluster, or metagenome-based approaches that have not been fully evaluated for processing metatranscriptomic datasets. We created a new bioinformatics pipeline, designed specifically for metatranscriptome dataset analysis, which runs in conjunction with Metagenome-RAST (MG-RAST) servers. Designed for use by researchers with relatively little bioinformatics experience, SAMSA offers a breakdown of metatranscriptome transcription activity levels by organism or transcript function, and is fully open source. We used this new tool to evaluate best practices for sequencing stool metatranscriptomes. RESULTS: Working with the MG-RAST annotation server, we constructed the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) software package, a complete pipeline for the analysis of gut microbiome data. SAMSA can summarize and evaluate raw annotation results, identifying abundant species and significant functional differences between metatranscriptomes. Using pilot data and simulated subsets, we determined experimental requirements for fecal gut metatranscriptomes. Sequences need to be either long reads (longer than 100 bp) or joined paired-end reads. Each sample needs 40–50 million raw sequences, which can be expected to yield the 5–10 million annotated reads necessary for accurate abundance measures. We also demonstrated that ribosomal RNA depletion does not equally deplete ribosomes from all species within a sample, and remaining rRNA sequences should be discarded. Using publicly available metatranscriptome data in which rRNA was not depleted, we were able to demonstrate that overall organism transcriptional activity can be measured using mRNA counts. We were also able to detect significant differences between control and experimental groups in both organism transcriptional activity and specific cellular functions. CONCLUSIONS: By making this new pipeline publicly available, we have created a powerful new tool for metatranscriptomics research, offering a new method for greater insight into the activity of diverse microbial communities. We further recommend that stool metatranscriptomes be ribodepleted and sequenced in a 100 bp paired end format with a minimum of 40 million reads per sample. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1270-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5041328
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50413282016-10-05 SAMSA: a comprehensive metatranscriptome analysis pipeline Westreich, Samuel T. Korf, Ian Mills, David A. Lemay, Danielle G. BMC Bioinformatics Methodology Article BACKGROUND: Although metatranscriptomics—the study of diverse microbial population activity based on RNA-seq data—is rapidly growing in popularity, there are limited options for biologists to analyze this type of data. Current approaches for processing metatranscriptomes rely on restricted databases and a dedicated computing cluster, or metagenome-based approaches that have not been fully evaluated for processing metatranscriptomic datasets. We created a new bioinformatics pipeline, designed specifically for metatranscriptome dataset analysis, which runs in conjunction with Metagenome-RAST (MG-RAST) servers. Designed for use by researchers with relatively little bioinformatics experience, SAMSA offers a breakdown of metatranscriptome transcription activity levels by organism or transcript function, and is fully open source. We used this new tool to evaluate best practices for sequencing stool metatranscriptomes. RESULTS: Working with the MG-RAST annotation server, we constructed the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) software package, a complete pipeline for the analysis of gut microbiome data. SAMSA can summarize and evaluate raw annotation results, identifying abundant species and significant functional differences between metatranscriptomes. Using pilot data and simulated subsets, we determined experimental requirements for fecal gut metatranscriptomes. Sequences need to be either long reads (longer than 100 bp) or joined paired-end reads. Each sample needs 40–50 million raw sequences, which can be expected to yield the 5–10 million annotated reads necessary for accurate abundance measures. We also demonstrated that ribosomal RNA depletion does not equally deplete ribosomes from all species within a sample, and remaining rRNA sequences should be discarded. Using publicly available metatranscriptome data in which rRNA was not depleted, we were able to demonstrate that overall organism transcriptional activity can be measured using mRNA counts. We were also able to detect significant differences between control and experimental groups in both organism transcriptional activity and specific cellular functions. CONCLUSIONS: By making this new pipeline publicly available, we have created a powerful new tool for metatranscriptomics research, offering a new method for greater insight into the activity of diverse microbial communities. We further recommend that stool metatranscriptomes be ribodepleted and sequenced in a 100 bp paired end format with a minimum of 40 million reads per sample. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1270-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-09-29 /pmc/articles/PMC5041328/ /pubmed/27687690 http://dx.doi.org/10.1186/s12859-016-1270-8 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Westreich, Samuel T.
Korf, Ian
Mills, David A.
Lemay, Danielle G.
SAMSA: a comprehensive metatranscriptome analysis pipeline
title SAMSA: a comprehensive metatranscriptome analysis pipeline
title_full SAMSA: a comprehensive metatranscriptome analysis pipeline
title_fullStr SAMSA: a comprehensive metatranscriptome analysis pipeline
title_full_unstemmed SAMSA: a comprehensive metatranscriptome analysis pipeline
title_short SAMSA: a comprehensive metatranscriptome analysis pipeline
title_sort samsa: a comprehensive metatranscriptome analysis pipeline
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5041328/
https://www.ncbi.nlm.nih.gov/pubmed/27687690
http://dx.doi.org/10.1186/s12859-016-1270-8
work_keys_str_mv AT westreichsamuelt samsaacomprehensivemetatranscriptomeanalysispipeline
AT korfian samsaacomprehensivemetatranscriptomeanalysispipeline
AT millsdavida samsaacomprehensivemetatranscriptomeanalysispipeline
AT lemaydanielleg samsaacomprehensivemetatranscriptomeanalysispipeline