Cargando…

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BACKGROUND: Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Yance, Li, Lei M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8317383/
https://www.ncbi.nlm.nih.gov/pubmed/34320923
http://dx.doi.org/10.1186/s12859-021-04288-0
_version_ 1783730060062097408
author Feng, Yance
Li, Lei M.
author_facet Feng, Yance
Li, Lei M.
author_sort Feng, Yance
collection PubMed
description BACKGROUND: Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. RESULTS: We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. CONCLUSIONS: MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04288-0.
format Online
Article
Text
id pubmed-8317383
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83173832021-07-30 MUREN: a robust and multi-reference approach of RNA-seq transcript normalization Feng, Yance Li, Lei M. BMC Bioinformatics Methodology Article BACKGROUND: Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. RESULTS: We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. CONCLUSIONS: MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04288-0. BioMed Central 2021-07-28 /pmc/articles/PMC8317383/ /pubmed/34320923 http://dx.doi.org/10.1186/s12859-021-04288-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Feng, Yance
Li, Lei M.
MUREN: a robust and multi-reference approach of RNA-seq transcript normalization
title MUREN: a robust and multi-reference approach of RNA-seq transcript normalization
title_full MUREN: a robust and multi-reference approach of RNA-seq transcript normalization
title_fullStr MUREN: a robust and multi-reference approach of RNA-seq transcript normalization
title_full_unstemmed MUREN: a robust and multi-reference approach of RNA-seq transcript normalization
title_short MUREN: a robust and multi-reference approach of RNA-seq transcript normalization
title_sort muren: a robust and multi-reference approach of rna-seq transcript normalization
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8317383/
https://www.ncbi.nlm.nih.gov/pubmed/34320923
http://dx.doi.org/10.1186/s12859-021-04288-0
work_keys_str_mv AT fengyance murenarobustandmultireferenceapproachofrnaseqtranscriptnormalization
AT lileim murenarobustandmultireferenceapproachofrnaseqtranscriptnormalization