Cargando…

scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence,...

Descripción completa

Detalles Bibliográficos
Autores principales: Ji, Xiangling, Tsao, Danielle, Bai, Kailun, Tsao, Min, Xing, Li, Zhang, Xuekui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027414/
https://www.ncbi.nlm.nih.gov/pubmed/36949780
http://dx.doi.org/10.1093/bioadv/vbad030
_version_ 1784909707310268416
author Ji, Xiangling
Tsao, Danielle
Bai, Kailun
Tsao, Min
Xing, Li
Zhang, Xuekui
author_facet Ji, Xiangling
Tsao, Danielle
Bai, Kailun
Tsao, Min
Xing, Li
Zhang, Xuekui
author_sort Ji, Xiangling
collection PubMed
description MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis is often to distinguish cell types so they can be investigated separately. Researchers have recently developed several automated cell-type annotation tools, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, no current cell annotation method explicitly utilizes dropout information. Fully utilizing dropout information motivated this work. RESULTS: We present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene’s marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using 14 real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods. Furthermore, because of its distinct modelling strategy, scAnnotate’s misclassified cells differ greatly from competitor methods. This suggests using scAnnotate together with other methods could further improve annotation accuracy. AVAILABILITY AND IMPLEMENTATION: We implemented scAnnotate as an R package and made it publicly available from CRAN: https://cran.r-project.org/package=scAnnotate. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-10027414
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100274142023-03-21 scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data Ji, Xiangling Tsao, Danielle Bai, Kailun Tsao, Min Xing, Li Zhang, Xuekui Bioinform Adv Original Paper MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis is often to distinguish cell types so they can be investigated separately. Researchers have recently developed several automated cell-type annotation tools, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, no current cell annotation method explicitly utilizes dropout information. Fully utilizing dropout information motivated this work. RESULTS: We present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene’s marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using 14 real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods. Furthermore, because of its distinct modelling strategy, scAnnotate’s misclassified cells differ greatly from competitor methods. This suggests using scAnnotate together with other methods could further improve annotation accuracy. AVAILABILITY AND IMPLEMENTATION: We implemented scAnnotate as an R package and made it publicly available from CRAN: https://cran.r-project.org/package=scAnnotate. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-03-13 /pmc/articles/PMC10027414/ /pubmed/36949780 http://dx.doi.org/10.1093/bioadv/vbad030 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Ji, Xiangling
Tsao, Danielle
Bai, Kailun
Tsao, Min
Xing, Li
Zhang, Xuekui
scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
title scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
title_full scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
title_fullStr scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
title_full_unstemmed scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
title_short scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data
title_sort scannotate: an automated cell-type annotation tool for single-cell rna-sequencing data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027414/
https://www.ncbi.nlm.nih.gov/pubmed/36949780
http://dx.doi.org/10.1093/bioadv/vbad030
work_keys_str_mv AT jixiangling scannotateanautomatedcelltypeannotationtoolforsinglecellrnasequencingdata
AT tsaodanielle scannotateanautomatedcelltypeannotationtoolforsinglecellrnasequencingdata
AT baikailun scannotateanautomatedcelltypeannotationtoolforsinglecellrnasequencingdata
AT tsaomin scannotateanautomatedcelltypeannotationtoolforsinglecellrnasequencingdata
AT xingli scannotateanautomatedcelltypeannotationtoolforsinglecellrnasequencingdata
AT zhangxuekui scannotateanautomatedcelltypeannotationtoolforsinglecellrnasequencingdata