Cargando…
Modeling zero inflation is not necessary for spatial transcriptomics
BACKGROUND: Spatial transcriptomics are a set of new technologies that profile gene expression on tissues with spatial localization information. With technological advances, recent spatial transcriptomics data are often in the form of sparse counts with an excessive amount of zero values. RESULTS: W...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116027/ https://www.ncbi.nlm.nih.gov/pubmed/35585605 http://dx.doi.org/10.1186/s13059-022-02684-0 |
_version_ | 1784710037929721856 |
---|---|
author | Zhao, Peiyao Zhu, Jiaqiang Ma, Ying Zhou, Xiang |
author_facet | Zhao, Peiyao Zhu, Jiaqiang Ma, Ying Zhou, Xiang |
author_sort | Zhao, Peiyao |
collection | PubMed |
description | BACKGROUND: Spatial transcriptomics are a set of new technologies that profile gene expression on tissues with spatial localization information. With technological advances, recent spatial transcriptomics data are often in the form of sparse counts with an excessive amount of zero values. RESULTS: We perform a comprehensive analysis on 20 spatial transcriptomics datasets collected from 11 distinct technologies to characterize the distributional properties of the expression count data and understand the statistical nature of the zero values. Across datasets, we show that a substantial fraction of genes displays overdispersion and/or zero inflation that cannot be accounted for by a Poisson model, with genes displaying overdispersion substantially overlapped with genes displaying zero inflation. In addition, we find that either the Poisson or the negative binomial model is sufficient for modeling the majority of genes across most spatial transcriptomics technologies. We further show major sources of overdispersion and zero inflation in spatial transcriptomics including gene expression heterogeneity across tissue locations and spatial distribution of cell types. In particular, when we focus on a relatively homogeneous set of tissue locations or control for cell type compositions, the number of detected overdispersed and/or zero-inflated genes is substantially reduced, and a simple Poisson model is often sufficient to fit the gene expression data there. CONCLUSIONS: Our study provides the first comprehensive evidence that excessive zeros in spatial transcriptomics are not due to zero inflation, supporting the use of count models without a zero inflation component for modeling spatial transcriptomics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02684-0. |
format | Online Article Text |
id | pubmed-9116027 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91160272022-05-19 Modeling zero inflation is not necessary for spatial transcriptomics Zhao, Peiyao Zhu, Jiaqiang Ma, Ying Zhou, Xiang Genome Biol Research BACKGROUND: Spatial transcriptomics are a set of new technologies that profile gene expression on tissues with spatial localization information. With technological advances, recent spatial transcriptomics data are often in the form of sparse counts with an excessive amount of zero values. RESULTS: We perform a comprehensive analysis on 20 spatial transcriptomics datasets collected from 11 distinct technologies to characterize the distributional properties of the expression count data and understand the statistical nature of the zero values. Across datasets, we show that a substantial fraction of genes displays overdispersion and/or zero inflation that cannot be accounted for by a Poisson model, with genes displaying overdispersion substantially overlapped with genes displaying zero inflation. In addition, we find that either the Poisson or the negative binomial model is sufficient for modeling the majority of genes across most spatial transcriptomics technologies. We further show major sources of overdispersion and zero inflation in spatial transcriptomics including gene expression heterogeneity across tissue locations and spatial distribution of cell types. In particular, when we focus on a relatively homogeneous set of tissue locations or control for cell type compositions, the number of detected overdispersed and/or zero-inflated genes is substantially reduced, and a simple Poisson model is often sufficient to fit the gene expression data there. CONCLUSIONS: Our study provides the first comprehensive evidence that excessive zeros in spatial transcriptomics are not due to zero inflation, supporting the use of count models without a zero inflation component for modeling spatial transcriptomics. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02684-0. BioMed Central 2022-05-18 /pmc/articles/PMC9116027/ /pubmed/35585605 http://dx.doi.org/10.1186/s13059-022-02684-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhao, Peiyao Zhu, Jiaqiang Ma, Ying Zhou, Xiang Modeling zero inflation is not necessary for spatial transcriptomics |
title | Modeling zero inflation is not necessary for spatial transcriptomics |
title_full | Modeling zero inflation is not necessary for spatial transcriptomics |
title_fullStr | Modeling zero inflation is not necessary for spatial transcriptomics |
title_full_unstemmed | Modeling zero inflation is not necessary for spatial transcriptomics |
title_short | Modeling zero inflation is not necessary for spatial transcriptomics |
title_sort | modeling zero inflation is not necessary for spatial transcriptomics |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116027/ https://www.ncbi.nlm.nih.gov/pubmed/35585605 http://dx.doi.org/10.1186/s13059-022-02684-0 |
work_keys_str_mv | AT zhaopeiyao modelingzeroinflationisnotnecessaryforspatialtranscriptomics AT zhujiaqiang modelingzeroinflationisnotnecessaryforspatialtranscriptomics AT maying modelingzeroinflationisnotnecessaryforspatialtranscriptomics AT zhouxiang modelingzeroinflationisnotnecessaryforspatialtranscriptomics |