Cargando…

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis

BACKGROUND: Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fernandes, Andrew D, Reid, Jennifer NS, Macklaim, Jean M, McMurrough, Thomas A, Edgell, David R, Gloor, Gregory B
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4030730/ https://www.ncbi.nlm.nih.gov/pubmed/24910773 http://dx.doi.org/10.1186/2049-2618-2-15

_version_	1782317417230237696
author	Fernandes, Andrew D Reid, Jennifer NS Macklaim, Jean M McMurrough, Thomas A Edgell, David R Gloor, Gregory B
author_facet	Fernandes, Andrew D Reid, Jennifer NS Macklaim, Jean M McMurrough, Thomas A Edgell, David R Gloor, Gregory B
author_sort	Fernandes, Andrew D
collection	PubMed
description	BACKGROUND: Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible. RESULTS: Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples. CONCLUSION: Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a similar approach.
format	Online Article Text
id	pubmed-4030730
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40307302014-06-06 Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis Fernandes, Andrew D Reid, Jennifer NS Macklaim, Jean M McMurrough, Thomas A Edgell, David R Gloor, Gregory B Microbiome Methodology BACKGROUND: Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible. RESULTS: Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples. CONCLUSION: Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a similar approach. BioMed Central 2014-05-05 /pmc/articles/PMC4030730/ /pubmed/24910773 http://dx.doi.org/10.1186/2049-2618-2-15 Text en Copyright © 2014 Fernandes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Fernandes, Andrew D Reid, Jennifer NS Macklaim, Jean M McMurrough, Thomas A Edgell, David R Gloor, Gregory B Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
title	Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
title_full	Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
title_fullStr	Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
title_full_unstemmed	Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
title_short	Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis
title_sort	unifying the analysis of high-throughput sequencing datasets: characterizing rna-seq, 16s rrna gene sequencing and selective growth experiments by compositional data analysis
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4030730/ https://www.ncbi.nlm.nih.gov/pubmed/24910773 http://dx.doi.org/10.1186/2049-2618-2-15
work_keys_str_mv	AT fernandesandrewd unifyingtheanalysisofhighthroughputsequencingdatasetscharacterizingrnaseq16srrnagenesequencingandselectivegrowthexperimentsbycompositionaldataanalysis AT reidjenniferns unifyingtheanalysisofhighthroughputsequencingdatasetscharacterizingrnaseq16srrnagenesequencingandselectivegrowthexperimentsbycompositionaldataanalysis AT macklaimjeanm unifyingtheanalysisofhighthroughputsequencingdatasetscharacterizingrnaseq16srrnagenesequencingandselectivegrowthexperimentsbycompositionaldataanalysis AT mcmurroughthomasa unifyingtheanalysisofhighthroughputsequencingdatasetscharacterizingrnaseq16srrnagenesequencingandselectivegrowthexperimentsbycompositionaldataanalysis AT edgelldavidr unifyingtheanalysisofhighthroughputsequencingdatasetscharacterizingrnaseq16srrnagenesequencingandselectivegrowthexperimentsbycompositionaldataanalysis AT gloorgregoryb unifyingtheanalysisofhighthroughputsequencingdatasetscharacterizingrnaseq16srrnagenesequencingandselectivegrowthexperimentsbycompositionaldataanalysis

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis

Ejemplares similares