Cargando…

scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception

Current cell-type annotation tools for single-cell RNA sequencing (scRNA-seq) data mainly utilize well-annotated source data to help identify cell types in target data. However, on account of privacy preservation, their requirements for raw source data may not always be satisfied. In this case, achi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wan, Hui, Chen, Liang, Deng, Minghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025768/
https://www.ncbi.nlm.nih.gov/pubmed/36608843
http://dx.doi.org/10.1016/j.gpb.2022.12.008
_version_ 1784909408145244160
author Wan, Hui
Chen, Liang
Deng, Minghua
author_facet Wan, Hui
Chen, Liang
Deng, Minghua
author_sort Wan, Hui
collection PubMed
description Current cell-type annotation tools for single-cell RNA sequencing (scRNA-seq) data mainly utilize well-annotated source data to help identify cell types in target data. However, on account of privacy preservation, their requirements for raw source data may not always be satisfied. In this case, achieving feature alignment between source and target data explicitly is impossible. Additionally, these methods are barely able to discover the presence of novel cell types. A subjective threshold is often selected by users to detect novel cells. We propose a universal annotation framework for scRNA-seq data called scEMAIL, which automatically detects novel cell types without accessing source data during adaptation. For new cell-type identification, a novel cell-type perception module is designed with three steps. First, an expert ensemble system measures uncertainty of each cell from three complementary aspects. Second, based on this measurement, bimodality tests are applied to detect the presence of new cell types. Third, once assured of their presence, an adaptive threshold via manifold mixup partitions target cells into “known” and “unknown” groups. Model adaptation is then conducted to alleviate the batch effect. We gather multi-order neighborhood messages globally and impose local affinity regularizations on “known” cells. These constraints mitigate wrong classifications of the source model via reliable self-supervised information of neighbors. scEMAIL is accurate and robust under various scenarios in both simulation and real data. It is also flexible to be applied to challenging single-cell ATAC-seq data without loss of superiority. The source code of scEMAIL can be accessed at https://github.com/aster-ww/scEMAIL and https://ngdc.cncb.ac.cn/biocode/tools/BT007335/releases/v1.0.
format Online
Article
Text
id pubmed-10025768
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-100257682023-03-21 scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception Wan, Hui Chen, Liang Deng, Minghua Genomics Proteomics Bioinformatics Method Current cell-type annotation tools for single-cell RNA sequencing (scRNA-seq) data mainly utilize well-annotated source data to help identify cell types in target data. However, on account of privacy preservation, their requirements for raw source data may not always be satisfied. In this case, achieving feature alignment between source and target data explicitly is impossible. Additionally, these methods are barely able to discover the presence of novel cell types. A subjective threshold is often selected by users to detect novel cells. We propose a universal annotation framework for scRNA-seq data called scEMAIL, which automatically detects novel cell types without accessing source data during adaptation. For new cell-type identification, a novel cell-type perception module is designed with three steps. First, an expert ensemble system measures uncertainty of each cell from three complementary aspects. Second, based on this measurement, bimodality tests are applied to detect the presence of new cell types. Third, once assured of their presence, an adaptive threshold via manifold mixup partitions target cells into “known” and “unknown” groups. Model adaptation is then conducted to alleviate the batch effect. We gather multi-order neighborhood messages globally and impose local affinity regularizations on “known” cells. These constraints mitigate wrong classifications of the source model via reliable self-supervised information of neighbors. scEMAIL is accurate and robust under various scenarios in both simulation and real data. It is also flexible to be applied to challenging single-cell ATAC-seq data without loss of superiority. The source code of scEMAIL can be accessed at https://github.com/aster-ww/scEMAIL and https://ngdc.cncb.ac.cn/biocode/tools/BT007335/releases/v1.0. Elsevier 2022-10 2023-01-03 /pmc/articles/PMC10025768/ /pubmed/36608843 http://dx.doi.org/10.1016/j.gpb.2022.12.008 Text en © 2022 The Authors. Published by Elsevier B.V. and Science Press on behalf of Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Wan, Hui
Chen, Liang
Deng, Minghua
scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
title scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
title_full scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
title_fullStr scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
title_full_unstemmed scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
title_short scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception
title_sort scemail: universal and source-free annotation method for scrna-seq data with novel cell-type perception
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025768/
https://www.ncbi.nlm.nih.gov/pubmed/36608843
http://dx.doi.org/10.1016/j.gpb.2022.12.008
work_keys_str_mv AT wanhui scemailuniversalandsourcefreeannotationmethodforscrnaseqdatawithnovelcelltypeperception
AT chenliang scemailuniversalandsourcefreeannotationmethodforscrnaseqdatawithnovelcelltypeperception
AT dengminghua scemailuniversalandsourcefreeannotationmethodforscrnaseqdatawithnovelcelltypeperception