Cargando…

An integrative method to normalize RNA-Seq data

BACKGROUND: Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical st...

Descripción completa

Detalles Bibliográficos
Autores principales: Filloux, Cyril, Cédric, Meersseman, Romain, Philippe, Lionel, Forestier, Christophe, Klopp, Dominique, Rocha, Abderrahman, Maftah, Daniel, Petit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4067528/
https://www.ncbi.nlm.nih.gov/pubmed/24929920
http://dx.doi.org/10.1186/1471-2105-15-188
_version_ 1782322304580059136
author Filloux, Cyril
Cédric, Meersseman
Romain, Philippe
Lionel, Forestier
Christophe, Klopp
Dominique, Rocha
Abderrahman, Maftah
Daniel, Petit
author_facet Filloux, Cyril
Cédric, Meersseman
Romain, Philippe
Lionel, Forestier
Christophe, Klopp
Dominique, Rocha
Abderrahman, Maftah
Daniel, Petit
author_sort Filloux, Cyril
collection PubMed
description BACKGROUND: Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method. RESULTS: We present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request. CONCLUSIONS: The methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required.
format Online
Article
Text
id pubmed-4067528
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40675282014-06-27 An integrative method to normalize RNA-Seq data Filloux, Cyril Cédric, Meersseman Romain, Philippe Lionel, Forestier Christophe, Klopp Dominique, Rocha Abderrahman, Maftah Daniel, Petit BMC Bioinformatics Research Article BACKGROUND: Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method. RESULTS: We present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request. CONCLUSIONS: The methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required. BioMed Central 2014-06-14 /pmc/articles/PMC4067528/ /pubmed/24929920 http://dx.doi.org/10.1186/1471-2105-15-188 Text en Copyright © 2014 Filloux et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Filloux, Cyril
Cédric, Meersseman
Romain, Philippe
Lionel, Forestier
Christophe, Klopp
Dominique, Rocha
Abderrahman, Maftah
Daniel, Petit
An integrative method to normalize RNA-Seq data
title An integrative method to normalize RNA-Seq data
title_full An integrative method to normalize RNA-Seq data
title_fullStr An integrative method to normalize RNA-Seq data
title_full_unstemmed An integrative method to normalize RNA-Seq data
title_short An integrative method to normalize RNA-Seq data
title_sort integrative method to normalize rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4067528/
https://www.ncbi.nlm.nih.gov/pubmed/24929920
http://dx.doi.org/10.1186/1471-2105-15-188
work_keys_str_mv AT fillouxcyril anintegrativemethodtonormalizernaseqdata
AT cedricmeersseman anintegrativemethodtonormalizernaseqdata
AT romainphilippe anintegrativemethodtonormalizernaseqdata
AT lionelforestier anintegrativemethodtonormalizernaseqdata
AT christopheklopp anintegrativemethodtonormalizernaseqdata
AT dominiquerocha anintegrativemethodtonormalizernaseqdata
AT abderrahmanmaftah anintegrativemethodtonormalizernaseqdata
AT danielpetit anintegrativemethodtonormalizernaseqdata
AT fillouxcyril integrativemethodtonormalizernaseqdata
AT cedricmeersseman integrativemethodtonormalizernaseqdata
AT romainphilippe integrativemethodtonormalizernaseqdata
AT lionelforestier integrativemethodtonormalizernaseqdata
AT christopheklopp integrativemethodtonormalizernaseqdata
AT dominiquerocha integrativemethodtonormalizernaseqdata
AT abderrahmanmaftah integrativemethodtonormalizernaseqdata
AT danielpetit integrativemethodtonormalizernaseqdata