Cargando…

Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data

Systematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single-cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the sta...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Krishan, Lalit, Manan, Biswas, Aditya, Sanada, Chad D., Greene, Cassandra, Hukari, Kyle, Maulik, Ujjwal, Bandyopadhyay, Sanghamitra, Ramalingam, Naveen, Ahuja, Gaurav, Ghosh, Abhik, Sengupta, Debarka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015842/
https://www.ncbi.nlm.nih.gov/pubmed/33674351
http://dx.doi.org/10.1101/gr.267070.120
_version_ 1783673757388242944
author Gupta, Krishan
Lalit, Manan
Biswas, Aditya
Sanada, Chad D.
Greene, Cassandra
Hukari, Kyle
Maulik, Ujjwal
Bandyopadhyay, Sanghamitra
Ramalingam, Naveen
Ahuja, Gaurav
Ghosh, Abhik
Sengupta, Debarka
author_facet Gupta, Krishan
Lalit, Manan
Biswas, Aditya
Sanada, Chad D.
Greene, Cassandra
Hukari, Kyle
Maulik, Ujjwal
Bandyopadhyay, Sanghamitra
Ramalingam, Naveen
Ahuja, Gaurav
Ghosh, Abhik
Sengupta, Debarka
author_sort Gupta, Krishan
collection PubMed
description Systematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single-cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the statistical determination of tissue-specific gene expression patterns. In the past few years, considerable efforts have been made to identify appropriate parametric models for single-cell expression data. The zero-inflated version of Poisson/negative binomial and log-normal distributions have emerged as the most popular alternatives owing to their ability to accommodate high dropout rates, as commonly observed in single-cell data. Although the majority of the parametric approaches directly model expression estimates, we explore the potential of modeling expression ranks, as robust surrogates for transcript abundance. Here we examined the performance of the discrete generalized beta distribution (DGBD) on real data and devised a Wald-type test for comparing gene expression across two phenotypically divergent groups of single cells. We performed a comprehensive assessment of the proposed method to understand its advantages compared with some of the existing best-practice approaches. We concluded that besides striking a reasonable balance between Type I and Type II errors, ROSeq, the proposed differential expression test, is exceptionally robust to expression noise and scales rapidly with increasing sample size. For wider dissemination and adoption of the method, we created an R package called ROSeq and made it available on the Bioconductor platform.
format Online
Article
Text
id pubmed-8015842
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-80158422021-10-01 Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data Gupta, Krishan Lalit, Manan Biswas, Aditya Sanada, Chad D. Greene, Cassandra Hukari, Kyle Maulik, Ujjwal Bandyopadhyay, Sanghamitra Ramalingam, Naveen Ahuja, Gaurav Ghosh, Abhik Sengupta, Debarka Genome Res Method Systematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single-cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the statistical determination of tissue-specific gene expression patterns. In the past few years, considerable efforts have been made to identify appropriate parametric models for single-cell expression data. The zero-inflated version of Poisson/negative binomial and log-normal distributions have emerged as the most popular alternatives owing to their ability to accommodate high dropout rates, as commonly observed in single-cell data. Although the majority of the parametric approaches directly model expression estimates, we explore the potential of modeling expression ranks, as robust surrogates for transcript abundance. Here we examined the performance of the discrete generalized beta distribution (DGBD) on real data and devised a Wald-type test for comparing gene expression across two phenotypically divergent groups of single cells. We performed a comprehensive assessment of the proposed method to understand its advantages compared with some of the existing best-practice approaches. We concluded that besides striking a reasonable balance between Type I and Type II errors, ROSeq, the proposed differential expression test, is exceptionally robust to expression noise and scales rapidly with increasing sample size. For wider dissemination and adoption of the method, we created an R package called ROSeq and made it available on the Bioconductor platform. Cold Spring Harbor Laboratory Press 2021-04 /pmc/articles/PMC8015842/ /pubmed/33674351 http://dx.doi.org/10.1101/gr.267070.120 Text en © 2021 Gupta et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Gupta, Krishan
Lalit, Manan
Biswas, Aditya
Sanada, Chad D.
Greene, Cassandra
Hukari, Kyle
Maulik, Ujjwal
Bandyopadhyay, Sanghamitra
Ramalingam, Naveen
Ahuja, Gaurav
Ghosh, Abhik
Sengupta, Debarka
Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data
title Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data
title_full Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data
title_fullStr Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data
title_full_unstemmed Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data
title_short Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-seq data
title_sort modeling expression ranks for noise-tolerant differential expression analysis of scrna-seq data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015842/
https://www.ncbi.nlm.nih.gov/pubmed/33674351
http://dx.doi.org/10.1101/gr.267070.120
work_keys_str_mv AT guptakrishan modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT lalitmanan modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT biswasaditya modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT sanadachadd modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT greenecassandra modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT hukarikyle modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT maulikujjwal modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT bandyopadhyaysanghamitra modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT ramalingamnaveen modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT ahujagaurav modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT ghoshabhik modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata
AT senguptadebarka modelingexpressionranksfornoisetolerantdifferentialexpressionanalysisofscrnaseqdata