Cargando…

SERE: Single-parameter quality control and sample comparison for RNA-Seq

BACKGROUND: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characte...

Descripción completa

Detalles Bibliográficos
Autores principales: Schulze, Stefan K, Kanwar, Rahul, Gölzenleuchter, Meike, Therneau, Terry M, Beutler, Andreas S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534338/
https://www.ncbi.nlm.nih.gov/pubmed/23033915
http://dx.doi.org/10.1186/1471-2164-13-524
_version_ 1782475317774909440
author Schulze, Stefan K
Kanwar, Rahul
Gölzenleuchter, Meike
Therneau, Terry M
Beutler, Andreas S
author_facet Schulze, Stefan K
Kanwar, Rahul
Gölzenleuchter, Meike
Therneau, Terry M
Beutler, Andreas S
author_sort Schulze, Stefan K
collection PubMed
description BACKGROUND: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. RESULTS: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. CONCLUSIONS: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.
format Online
Article
Text
id pubmed-3534338
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35343382013-01-03 SERE: Single-parameter quality control and sample comparison for RNA-Seq Schulze, Stefan K Kanwar, Rahul Gölzenleuchter, Meike Therneau, Terry M Beutler, Andreas S BMC Genomics Methodology Article BACKGROUND: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. RESULTS: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. CONCLUSIONS: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. BioMed Central 2012-10-03 /pmc/articles/PMC3534338/ /pubmed/23033915 http://dx.doi.org/10.1186/1471-2164-13-524 Text en Copyright ©2012 Schulze et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Schulze, Stefan K
Kanwar, Rahul
Gölzenleuchter, Meike
Therneau, Terry M
Beutler, Andreas S
SERE: Single-parameter quality control and sample comparison for RNA-Seq
title SERE: Single-parameter quality control and sample comparison for RNA-Seq
title_full SERE: Single-parameter quality control and sample comparison for RNA-Seq
title_fullStr SERE: Single-parameter quality control and sample comparison for RNA-Seq
title_full_unstemmed SERE: Single-parameter quality control and sample comparison for RNA-Seq
title_short SERE: Single-parameter quality control and sample comparison for RNA-Seq
title_sort sere: single-parameter quality control and sample comparison for rna-seq
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534338/
https://www.ncbi.nlm.nih.gov/pubmed/23033915
http://dx.doi.org/10.1186/1471-2164-13-524
work_keys_str_mv AT schulzestefank seresingleparameterqualitycontrolandsamplecomparisonforrnaseq
AT kanwarrahul seresingleparameterqualitycontrolandsamplecomparisonforrnaseq
AT golzenleuchtermeike seresingleparameterqualitycontrolandsamplecomparisonforrnaseq
AT therneauterrym seresingleparameterqualitycontrolandsamplecomparisonforrnaseq
AT beutlerandreass seresingleparameterqualitycontrolandsamplecomparisonforrnaseq