Cargando…

RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study

BACKGROUND: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in...

Descripción completa

Detalles Bibliográficos
Autores principales: Berghoff, Bork A., Karlsson, Torgny, Källman, Thomas, Wagner, E. Gerhart H., Grabherr, Manfred G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584328/
https://www.ncbi.nlm.nih.gov/pubmed/28878825
http://dx.doi.org/10.1186/s13040-017-0150-8
_version_ 1783261458507759616
author Berghoff, Bork A.
Karlsson, Torgny
Källman, Thomas
Wagner, E. Gerhart H.
Grabherr, Manfred G.
author_facet Berghoff, Bork A.
Karlsson, Torgny
Källman, Thomas
Wagner, E. Gerhart H.
Grabherr, Manfred G.
author_sort Berghoff, Bork A.
collection PubMed
description BACKGROUND: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. RESULTS: Here, we present a novel method, moose (2), which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli, and show how moose (2) is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. CONCLUSIONS: The proposed RNA-seq normalization method, moose (2), is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13040-017-0150-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5584328
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55843282017-09-06 RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study Berghoff, Bork A. Karlsson, Torgny Källman, Thomas Wagner, E. Gerhart H. Grabherr, Manfred G. BioData Min Research BACKGROUND: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. RESULTS: Here, we present a novel method, moose (2), which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli, and show how moose (2) is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. CONCLUSIONS: The proposed RNA-seq normalization method, moose (2), is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13040-017-0150-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-05 /pmc/articles/PMC5584328/ /pubmed/28878825 http://dx.doi.org/10.1186/s13040-017-0150-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Berghoff, Bork A.
Karlsson, Torgny
Källman, Thomas
Wagner, E. Gerhart H.
Grabherr, Manfred G.
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_full RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_fullStr RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_full_unstemmed RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_short RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_sort rna-sequence data normalization through in silico prediction of reference genes: the bacterial response to dna damage as case study
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584328/
https://www.ncbi.nlm.nih.gov/pubmed/28878825
http://dx.doi.org/10.1186/s13040-017-0150-8
work_keys_str_mv AT berghoffborka rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT karlssontorgny rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT kallmanthomas rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT wagneregerharth rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT grabherrmanfredg rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy