Cargando…

Reducing bias in RNA sequencing data: a novel approach to compute counts

BACKGROUND: In the last decade, Next-Generation Sequencing technologies have been extensively applied to quantitative transcriptomics, making RNA sequencing a valuable alternative to microarrays for measuring and comparing gene transcription levels. Although several methods have been proposed to pro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Finotello, Francesca, Lavezzo, Enrico, Bianco, Luca, Barzon, Luisa, Mazzon, Paolo, Fontana, Paolo, Toppo, Stefano, Di Camillo, Barbara
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016203/ https://www.ncbi.nlm.nih.gov/pubmed/24564404 http://dx.doi.org/10.1186/1471-2105-15-S1-S7

_version_	1782315466512924672
author	Finotello, Francesca Lavezzo, Enrico Bianco, Luca Barzon, Luisa Mazzon, Paolo Fontana, Paolo Toppo, Stefano Di Camillo, Barbara
author_facet	Finotello, Francesca Lavezzo, Enrico Bianco, Luca Barzon, Luisa Mazzon, Paolo Fontana, Paolo Toppo, Stefano Di Camillo, Barbara
author_sort	Finotello, Francesca
collection	PubMed
description	BACKGROUND: In the last decade, Next-Generation Sequencing technologies have been extensively applied to quantitative transcriptomics, making RNA sequencing a valuable alternative to microarrays for measuring and comparing gene transcription levels. Although several methods have been proposed to provide an unbiased estimate of transcript abundances through data normalization, all of them are based on an initial count of the total number of reads mapping on each transcript. This procedure, in principle robust to random noise, is actually error-prone if reads are not uniformly distributed along sequences, as happens indeed due to sequencing errors and ambiguity in read mapping. Here we propose a new approach, called maxcounts, to quantify the expression assigned to an exon as the maximum of its per-base counts, and we assess its performance in comparison with the standard approach described above, which considers the total number of reads aligned to an exon. The two measures are compared using multiple data sets and considering several evaluation criteria: independence from gene-specific covariates, such as exon length and GC-content, accuracy and precision in the quantification of true concentrations and robustness of measurements to variations of alignments quality. RESULTS: Both measures show high accuracy and low dependency on GC-content. However, maxcounts expression quantification is less biased towards long exons with respect to the standard approach. Moreover, it shows lower technical variability at low expressions and is more robust to variations in the quality of alignments. CONCLUSIONS: In summary, we confirm that counts computed with the standard approach depend on the length of the feature they are summarized on, and are sensitive to the non-uniform distribution of reads along transcripts. On the opposite, maxcounts are robust to biases due to the non-uniformity distribution of reads and are characterized by a lower technical variability. Hence, we propose maxcounts as an alternative approach for quantitative RNA-sequencing applications.
format	Online Article Text
id	pubmed-4016203
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40162032014-05-23 Reducing bias in RNA sequencing data: a novel approach to compute counts Finotello, Francesca Lavezzo, Enrico Bianco, Luca Barzon, Luisa Mazzon, Paolo Fontana, Paolo Toppo, Stefano Di Camillo, Barbara BMC Bioinformatics Research BACKGROUND: In the last decade, Next-Generation Sequencing technologies have been extensively applied to quantitative transcriptomics, making RNA sequencing a valuable alternative to microarrays for measuring and comparing gene transcription levels. Although several methods have been proposed to provide an unbiased estimate of transcript abundances through data normalization, all of them are based on an initial count of the total number of reads mapping on each transcript. This procedure, in principle robust to random noise, is actually error-prone if reads are not uniformly distributed along sequences, as happens indeed due to sequencing errors and ambiguity in read mapping. Here we propose a new approach, called maxcounts, to quantify the expression assigned to an exon as the maximum of its per-base counts, and we assess its performance in comparison with the standard approach described above, which considers the total number of reads aligned to an exon. The two measures are compared using multiple data sets and considering several evaluation criteria: independence from gene-specific covariates, such as exon length and GC-content, accuracy and precision in the quantification of true concentrations and robustness of measurements to variations of alignments quality. RESULTS: Both measures show high accuracy and low dependency on GC-content. However, maxcounts expression quantification is less biased towards long exons with respect to the standard approach. Moreover, it shows lower technical variability at low expressions and is more robust to variations in the quality of alignments. CONCLUSIONS: In summary, we confirm that counts computed with the standard approach depend on the length of the feature they are summarized on, and are sensitive to the non-uniform distribution of reads along transcripts. On the opposite, maxcounts are robust to biases due to the non-uniformity distribution of reads and are characterized by a lower technical variability. Hence, we propose maxcounts as an alternative approach for quantitative RNA-sequencing applications. BioMed Central 2014-01-10 /pmc/articles/PMC4016203/ /pubmed/24564404 http://dx.doi.org/10.1186/1471-2105-15-S1-S7 Text en Copyright © 2014 Finotello et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Finotello, Francesca Lavezzo, Enrico Bianco, Luca Barzon, Luisa Mazzon, Paolo Fontana, Paolo Toppo, Stefano Di Camillo, Barbara Reducing bias in RNA sequencing data: a novel approach to compute counts
title	Reducing bias in RNA sequencing data: a novel approach to compute counts
title_full	Reducing bias in RNA sequencing data: a novel approach to compute counts
title_fullStr	Reducing bias in RNA sequencing data: a novel approach to compute counts
title_full_unstemmed	Reducing bias in RNA sequencing data: a novel approach to compute counts
title_short	Reducing bias in RNA sequencing data: a novel approach to compute counts
title_sort	reducing bias in rna sequencing data: a novel approach to compute counts
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016203/ https://www.ncbi.nlm.nih.gov/pubmed/24564404 http://dx.doi.org/10.1186/1471-2105-15-S1-S7
work_keys_str_mv	AT finotellofrancesca reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT lavezzoenrico reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT biancoluca reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT barzonluisa reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT mazzonpaolo reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT fontanapaolo reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT toppostefano reducingbiasinrnasequencingdataanovelapproachtocomputecounts AT dicamillobarbara reducingbiasinrnasequencingdataanovelapproachtocomputecounts

Reducing bias in RNA sequencing data: a novel approach to compute counts

Ejemplares similares