Cargando…

HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Qing, Han, Chenggong, Jin, Victor, Lin, Shili
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9232133/
https://www.ncbi.nlm.nih.gov/pubmed/35696429
http://dx.doi.org/10.1371/journal.pcbi.1010129
_version_ 1784735505439522816
author Xie, Qing
Han, Chenggong
Jin, Victor
Lin, Shili
author_facet Xie, Qing
Han, Chenggong
Jin, Victor
Lin, Shili
author_sort Xie, Qing
collection PubMed
description Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.
format Online
Article
Text
id pubmed-9232133
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92321332022-06-25 HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data Xie, Qing Han, Chenggong Jin, Victor Lin, Shili PLoS Comput Biol Research Article Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex. Public Library of Science 2022-06-13 /pmc/articles/PMC9232133/ /pubmed/35696429 http://dx.doi.org/10.1371/journal.pcbi.1010129 Text en © 2022 Xie et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Xie, Qing
Han, Chenggong
Jin, Victor
Lin, Shili
HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data
title HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data
title_full HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data
title_fullStr HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data
title_full_unstemmed HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data
title_short HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data
title_sort hicimpute: a bayesian hierarchical model for identifying structural zeros and enhancing single cell hi-c data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9232133/
https://www.ncbi.nlm.nih.gov/pubmed/35696429
http://dx.doi.org/10.1371/journal.pcbi.1010129
work_keys_str_mv AT xieqing hicimputeabayesianhierarchicalmodelforidentifyingstructuralzerosandenhancingsinglecellhicdata
AT hanchenggong hicimputeabayesianhierarchicalmodelforidentifyingstructuralzerosandenhancingsinglecellhicdata
AT jinvictor hicimputeabayesianhierarchicalmodelforidentifyingstructuralzerosandenhancingsinglecellhicdata
AT linshili hicimputeabayesianhierarchicalmodelforidentifyingstructuralzerosandenhancingsinglecellhicdata