Cargando…

Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression

BACKGROUND: A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step i...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Kefei, Shen, Li, Jiang, Hui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886201/
https://www.ncbi.nlm.nih.gov/pubmed/31787074
http://dx.doi.org/10.1186/s12859-019-3070-4
_version_ 1783474837723807744
author Liu, Kefei
Shen, Li
Jiang, Hui
author_facet Liu, Kefei
Shen, Li
Jiang, Hui
author_sort Liu, Kefei
collection PubMed
description BACKGROUND: A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step in differential expression (DE) analysis. In most existing methods, the normalization step is performed prior to the DE analysis. Recently, Jiang and Zhan proposed a statistical method which introduces sample-specific normalization parameters into a joint model, which allows for simultaneous normalization and differential expression analysis from log-transformed RNA-seq data. Furthermore, an ℓ(0) penalty is used to yield a sparse solution which selects a subset of DE genes. The experimental conditions are restricted to be categorical in their work. RESULTS: In this paper, we generalize Jiang and Zhan’s method to handle experimental conditions that are measured in continuous variables. As a result, genes with expression levels associated with a single or multiple covariates can be detected. As the problem being high-dimensional, non-differentiable and non-convex, we develop an efficient algorithm for model fitting. CONCLUSIONS: Experiments on synthetic data demonstrate that the proposed method outperforms existing methods in terms of detection accuracy when a large fraction of genes are differentially expressed in an asymmetric manner, and the performance gain becomes more substantial for larger sample sizes. We also apply our method to a real prostate cancer RNA-seq dataset to identify genes associated with pre-operative prostate-specific antigen (PSA) levels in patients.
format Online
Article
Text
id pubmed-6886201
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68862012019-12-11 Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression Liu, Kefei Shen, Li Jiang, Hui BMC Bioinformatics Research BACKGROUND: A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step in differential expression (DE) analysis. In most existing methods, the normalization step is performed prior to the DE analysis. Recently, Jiang and Zhan proposed a statistical method which introduces sample-specific normalization parameters into a joint model, which allows for simultaneous normalization and differential expression analysis from log-transformed RNA-seq data. Furthermore, an ℓ(0) penalty is used to yield a sparse solution which selects a subset of DE genes. The experimental conditions are restricted to be categorical in their work. RESULTS: In this paper, we generalize Jiang and Zhan’s method to handle experimental conditions that are measured in continuous variables. As a result, genes with expression levels associated with a single or multiple covariates can be detected. As the problem being high-dimensional, non-differentiable and non-convex, we develop an efficient algorithm for model fitting. CONCLUSIONS: Experiments on synthetic data demonstrate that the proposed method outperforms existing methods in terms of detection accuracy when a large fraction of genes are differentially expressed in an asymmetric manner, and the performance gain becomes more substantial for larger sample sizes. We also apply our method to a real prostate cancer RNA-seq dataset to identify genes associated with pre-operative prostate-specific antigen (PSA) levels in patients. BioMed Central 2019-12-02 /pmc/articles/PMC6886201/ /pubmed/31787074 http://dx.doi.org/10.1186/s12859-019-3070-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Kefei
Shen, Li
Jiang, Hui
Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
title Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
title_full Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
title_fullStr Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
title_full_unstemmed Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
title_short Joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
title_sort joint between-sample normalization and differential expression detection through ℓ(0)-regularized regression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886201/
https://www.ncbi.nlm.nih.gov/pubmed/31787074
http://dx.doi.org/10.1186/s12859-019-3070-4
work_keys_str_mv AT liukefei jointbetweensamplenormalizationanddifferentialexpressiondetectionthroughl0regularizedregression
AT shenli jointbetweensamplenormalizationanddifferentialexpressiondetectionthroughl0regularizedregression
AT jianghui jointbetweensamplenormalizationanddifferentialexpressiondetectionthroughl0regularizedregression