Cargando…

Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data

With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construc...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jiaqi, Singh, Ritambhara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900775/
https://www.ncbi.nlm.nih.gov/pubmed/36747724
http://dx.doi.org/10.1101/2023.01.24.525447
_version_ 1784882916227022848
author Zhang, Jiaqi
Singh, Ritambhara
author_facet Zhang, Jiaqi
Singh, Ritambhara
author_sort Zhang, Jiaqi
collection PubMed
description With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses.
format Online
Article
Text
id pubmed-9900775
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-99007752023-02-07 Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data Zhang, Jiaqi Singh, Ritambhara bioRxiv Article With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses. Cold Spring Harbor Laboratory 2023-01-25 /pmc/articles/PMC9900775/ /pubmed/36747724 http://dx.doi.org/10.1101/2023.01.24.525447 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Zhang, Jiaqi
Singh, Ritambhara
Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data
title Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data
title_full Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data
title_fullStr Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data
title_full_unstemmed Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data
title_short Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data
title_sort investigating the complexity of gene co-expression estimation for single-cell data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900775/
https://www.ncbi.nlm.nih.gov/pubmed/36747724
http://dx.doi.org/10.1101/2023.01.24.525447
work_keys_str_mv AT zhangjiaqi investigatingthecomplexityofgenecoexpressionestimationforsinglecelldata
AT singhritambhara investigatingthecomplexityofgenecoexpressionestimationforsinglecelldata