Cargando…

Benchmarking automated cell type annotation tools for single-cell ATAC-seq data

As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in unders...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yuge, Sun, Xingzhi, Zhao, Hongyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9792779/
https://www.ncbi.nlm.nih.gov/pubmed/36583014
http://dx.doi.org/10.3389/fgene.2022.1063233
_version_ 1784859707031158784
author Wang, Yuge
Sun, Xingzhi
Zhao, Hongyu
author_facet Wang, Yuge
Sun, Xingzhi
Zhao, Hongyu
author_sort Wang, Yuge
collection PubMed
description As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.
format Online
Article
Text
id pubmed-9792779
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97927792022-12-28 Benchmarking automated cell type annotation tools for single-cell ATAC-seq data Wang, Yuge Sun, Xingzhi Zhao, Hongyu Front Genet Genetics As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq. Frontiers Media S.A. 2022-12-13 /pmc/articles/PMC9792779/ /pubmed/36583014 http://dx.doi.org/10.3389/fgene.2022.1063233 Text en Copyright © 2022 Wang, Sun and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wang, Yuge
Sun, Xingzhi
Zhao, Hongyu
Benchmarking automated cell type annotation tools for single-cell ATAC-seq data
title Benchmarking automated cell type annotation tools for single-cell ATAC-seq data
title_full Benchmarking automated cell type annotation tools for single-cell ATAC-seq data
title_fullStr Benchmarking automated cell type annotation tools for single-cell ATAC-seq data
title_full_unstemmed Benchmarking automated cell type annotation tools for single-cell ATAC-seq data
title_short Benchmarking automated cell type annotation tools for single-cell ATAC-seq data
title_sort benchmarking automated cell type annotation tools for single-cell atac-seq data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9792779/
https://www.ncbi.nlm.nih.gov/pubmed/36583014
http://dx.doi.org/10.3389/fgene.2022.1063233
work_keys_str_mv AT wangyuge benchmarkingautomatedcelltypeannotationtoolsforsinglecellatacseqdata
AT sunxingzhi benchmarkingautomatedcelltypeannotationtoolsforsinglecellatacseqdata
AT zhaohongyu benchmarkingautomatedcelltypeannotationtoolsforsinglecellatacseqdata