Cargando…

Leveraging gene correlations in single cell transcriptomic data

BACKGROUND: Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data—looking for rare cell types, subtleties of cell states, and details of gene regulatory networks—there is a growing need for algorithms with contro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Silkwood, Kai, Dollinger, Emmanuel, Gervin, Josh, Atwood, Scott, Nie, Qing, Lander, Arthur D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055147/ https://www.ncbi.nlm.nih.gov/pubmed/36993765 http://dx.doi.org/10.1101/2023.03.14.532643

_version_	1785015830166110208
author	Silkwood, Kai Dollinger, Emmanuel Gervin, Josh Atwood, Scott Nie, Qing Lander, Arthur D.
author_facet	Silkwood, Kai Dollinger, Emmanuel Gervin, Josh Atwood, Scott Nie, Qing Lander, Arthur D.
author_sort	Silkwood, Kai
collection	PubMed
description	BACKGROUND: Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data—looking for rare cell types, subtleties of cell states, and details of gene regulatory networks—there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data when ground truth about biological variation is unknown (i.e., usually). RESULTS: We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization—a step that skews distributions, particularly for sparse data—and calculate p-values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS: New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
format	Online Article Text
id	pubmed-10055147
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-100551472023-03-30 Leveraging gene correlations in single cell transcriptomic data Silkwood, Kai Dollinger, Emmanuel Gervin, Josh Atwood, Scott Nie, Qing Lander, Arthur D. bioRxiv Article BACKGROUND: Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data—looking for rare cell types, subtleties of cell states, and details of gene regulatory networks—there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data when ground truth about biological variation is unknown (i.e., usually). RESULTS: We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization—a step that skews distributions, particularly for sparse data—and calculate p-values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS: New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations. Cold Spring Harbor Laboratory 2023-11-01 /pmc/articles/PMC10055147/ /pubmed/36993765 http://dx.doi.org/10.1101/2023.03.14.532643 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle	Article Silkwood, Kai Dollinger, Emmanuel Gervin, Josh Atwood, Scott Nie, Qing Lander, Arthur D. Leveraging gene correlations in single cell transcriptomic data
title	Leveraging gene correlations in single cell transcriptomic data
title_full	Leveraging gene correlations in single cell transcriptomic data
title_fullStr	Leveraging gene correlations in single cell transcriptomic data
title_full_unstemmed	Leveraging gene correlations in single cell transcriptomic data
title_short	Leveraging gene correlations in single cell transcriptomic data
title_sort	leveraging gene correlations in single cell transcriptomic data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055147/ https://www.ncbi.nlm.nih.gov/pubmed/36993765 http://dx.doi.org/10.1101/2023.03.14.532643
work_keys_str_mv	AT silkwoodkai leveraginggenecorrelationsinsinglecelltranscriptomicdata AT dollingeremmanuel leveraginggenecorrelationsinsinglecelltranscriptomicdata AT gervinjosh leveraginggenecorrelationsinsinglecelltranscriptomicdata AT atwoodscott leveraginggenecorrelationsinsinglecelltranscriptomicdata AT nieqing leveraginggenecorrelationsinsinglecelltranscriptomicdata AT landerarthurd leveraginggenecorrelationsinsinglecelltranscriptomicdata

Leveraging gene correlations in single cell transcriptomic data

Ejemplares similares