Cargando…

Matrix prior for data transfer between single cell data types in latent Dirichlet allocation

Single cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage informa...

Descripción completa

Detalles Bibliográficos
Autores principales: Min, Alan, Durham, Timothy, Gevirtzman, Louis, Noble, William Stafford
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191269/
https://www.ncbi.nlm.nih.gov/pubmed/37146053
http://dx.doi.org/10.1371/journal.pcbi.1011049
_version_ 1785043426506440704
author Min, Alan
Durham, Timothy
Gevirtzman, Louis
Noble, William Stafford
author_facet Min, Alan
Durham, Timothy
Gevirtzman, Louis
Noble, William Stafford
author_sort Min, Alan
collection PubMed
description Single cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage information from previously generated large scale scATAC-seq or scRNA-seq data to guide our analysis of new scATAC-seq datasets. We analyze scATAC-seq data using latent Dirichlet allocation (LDA), a Bayesian algorithm that was developed to model text corpora, summarizing documents as mixtures of topics defined based on the words that distinguish the documents. When applied to scATAC-seq, LDA treats cells as documents and their accessible sites as words, identifying “topics” based on the cell type-specific accessible sites in those cells. Previous work used uniform symmetric priors in LDA, but we hypothesized that nonuniform matrix priors generated from LDA models trained on existing data sets may enable improved detection of cell types in new data sets, especially if they have relatively few cells. In this work, we test this hypothesis in scATAC-seq data from whole C. elegans nematodes and SHARE-seq data from mouse skin cells. We show that nonsymmetric matrix priors for LDA improve our ability to capture cell type information from small scATAC-seq datasets.
format Online
Article
Text
id pubmed-10191269
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101912692023-05-18 Matrix prior for data transfer between single cell data types in latent Dirichlet allocation Min, Alan Durham, Timothy Gevirtzman, Louis Noble, William Stafford PLoS Comput Biol Research Article Single cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage information from previously generated large scale scATAC-seq or scRNA-seq data to guide our analysis of new scATAC-seq datasets. We analyze scATAC-seq data using latent Dirichlet allocation (LDA), a Bayesian algorithm that was developed to model text corpora, summarizing documents as mixtures of topics defined based on the words that distinguish the documents. When applied to scATAC-seq, LDA treats cells as documents and their accessible sites as words, identifying “topics” based on the cell type-specific accessible sites in those cells. Previous work used uniform symmetric priors in LDA, but we hypothesized that nonuniform matrix priors generated from LDA models trained on existing data sets may enable improved detection of cell types in new data sets, especially if they have relatively few cells. In this work, we test this hypothesis in scATAC-seq data from whole C. elegans nematodes and SHARE-seq data from mouse skin cells. We show that nonsymmetric matrix priors for LDA improve our ability to capture cell type information from small scATAC-seq datasets. Public Library of Science 2023-05-05 /pmc/articles/PMC10191269/ /pubmed/37146053 http://dx.doi.org/10.1371/journal.pcbi.1011049 Text en © 2023 Min et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Min, Alan
Durham, Timothy
Gevirtzman, Louis
Noble, William Stafford
Matrix prior for data transfer between single cell data types in latent Dirichlet allocation
title Matrix prior for data transfer between single cell data types in latent Dirichlet allocation
title_full Matrix prior for data transfer between single cell data types in latent Dirichlet allocation
title_fullStr Matrix prior for data transfer between single cell data types in latent Dirichlet allocation
title_full_unstemmed Matrix prior for data transfer between single cell data types in latent Dirichlet allocation
title_short Matrix prior for data transfer between single cell data types in latent Dirichlet allocation
title_sort matrix prior for data transfer between single cell data types in latent dirichlet allocation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191269/
https://www.ncbi.nlm.nih.gov/pubmed/37146053
http://dx.doi.org/10.1371/journal.pcbi.1011049
work_keys_str_mv AT minalan matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation
AT durhamtimothy matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation
AT gevirtzmanlouis matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation
AT noblewilliamstafford matrixpriorfordatatransferbetweensinglecelldatatypesinlatentdirichletallocation