Cargando…

The bench scientist's guide to statistical analysis of RNA-Seq data

BACKGROUND: RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yendrek, Craig R, Ainsworth, Elizabeth A, Thimmapuram, Jyothi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3522531/ https://www.ncbi.nlm.nih.gov/pubmed/22980220 http://dx.doi.org/10.1186/1756-0500-5-506

_version_	1782253080875630592
author	Yendrek, Craig R Ainsworth, Elizabeth A Thimmapuram, Jyothi
author_facet	Yendrek, Craig R Ainsworth, Elizabeth A Thimmapuram, Jyothi
author_sort	Yendrek, Craig R
collection	PubMed
description	BACKGROUND: RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression. FINDINGS: When comparing statistical tools, the negative binomial distribution-based methods, edgeR and DESeq, respectively identified 11,995 and 11,317 differentially expressed genes from an RNA-seq dataset generated from soybean leaf tissue grown in elevated O(3). However, the number of genes in common between these two methods was only 10,535, resulting in 2,242 genes determined to be differentially expressed by only one method. Upon analysis of the non-significant genes, several limitations of these analytic tools were revealed, including evidence for overly stringent parameters for determining statistical significance of differentially expressed genes as well as increased type II error for high abundance transcripts. CONCLUSIONS: Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical limitations, RNA-Seq provides highly accurate transcript abundance quantification that is comparable to qRT-PCR.
format	Online Article Text
id	pubmed-3522531
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35225312012-12-15 The bench scientist's guide to statistical analysis of RNA-Seq data Yendrek, Craig R Ainsworth, Elizabeth A Thimmapuram, Jyothi BMC Res Notes Technical Note BACKGROUND: RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression. FINDINGS: When comparing statistical tools, the negative binomial distribution-based methods, edgeR and DESeq, respectively identified 11,995 and 11,317 differentially expressed genes from an RNA-seq dataset generated from soybean leaf tissue grown in elevated O(3). However, the number of genes in common between these two methods was only 10,535, resulting in 2,242 genes determined to be differentially expressed by only one method. Upon analysis of the non-significant genes, several limitations of these analytic tools were revealed, including evidence for overly stringent parameters for determining statistical significance of differentially expressed genes as well as increased type II error for high abundance transcripts. CONCLUSIONS: Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical limitations, RNA-Seq provides highly accurate transcript abundance quantification that is comparable to qRT-PCR. BioMed Central 2012-09-14 /pmc/articles/PMC3522531/ /pubmed/22980220 http://dx.doi.org/10.1186/1756-0500-5-506 Text en Copyright ©2012 Yendrek et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Yendrek, Craig R Ainsworth, Elizabeth A Thimmapuram, Jyothi The bench scientist's guide to statistical analysis of RNA-Seq data
title	The bench scientist's guide to statistical analysis of RNA-Seq data
title_full	The bench scientist's guide to statistical analysis of RNA-Seq data
title_fullStr	The bench scientist's guide to statistical analysis of RNA-Seq data
title_full_unstemmed	The bench scientist's guide to statistical analysis of RNA-Seq data
title_short	The bench scientist's guide to statistical analysis of RNA-Seq data
title_sort	bench scientist's guide to statistical analysis of rna-seq data
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3522531/ https://www.ncbi.nlm.nih.gov/pubmed/22980220 http://dx.doi.org/10.1186/1756-0500-5-506
work_keys_str_mv	AT yendrekcraigr thebenchscientistsguidetostatisticalanalysisofrnaseqdata AT ainsworthelizabetha thebenchscientistsguidetostatisticalanalysisofrnaseqdata AT thimmapuramjyothi thebenchscientistsguidetostatisticalanalysisofrnaseqdata AT yendrekcraigr benchscientistsguidetostatisticalanalysisofrnaseqdata AT ainsworthelizabetha benchscientistsguidetostatisticalanalysisofrnaseqdata AT thimmapuramjyothi benchscientistsguidetostatisticalanalysisofrnaseqdata

The bench scientist's guide to statistical analysis of RNA-Seq data

Ejemplares similares