Cargando…

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference

Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hier...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Yingxin, Cao, Yue, Kim, Hani Jieun, Salim, Agus, Speed, Terence P, Lin, David M, Yang, Pengyi, Yang, Jean Yee Hwa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306901/
https://www.ncbi.nlm.nih.gov/pubmed/32567229
http://dx.doi.org/10.15252/msb.20199389
_version_ 1783548740160716800
author Lin, Yingxin
Cao, Yue
Kim, Hani Jieun
Salim, Agus
Speed, Terence P
Lin, David M
Yang, Pengyi
Yang, Jean Yee Hwa
author_facet Lin, Yingxin
Cao, Yue
Kim, Hani Jieun
Salim, Agus
Speed, Terence P
Lin, David M
Yang, Pengyi
Yang, Jean Yee Hwa
author_sort Lin, Yingxin
collection PubMed
description Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data.
format Online
Article
Text
id pubmed-7306901
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73069012020-06-22 scClassify: sample size estimation and multiscale classification of cells using single and multiple reference Lin, Yingxin Cao, Yue Kim, Hani Jieun Salim, Agus Speed, Terence P Lin, David M Yang, Pengyi Yang, Jean Yee Hwa Mol Syst Biol Methods Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. John Wiley and Sons Inc. 2020-06-22 /pmc/articles/PMC7306901/ /pubmed/32567229 http://dx.doi.org/10.15252/msb.20199389 Text en © 2020 The Authors. Published under the terms of the CC BY 4.0 license This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Lin, Yingxin
Cao, Yue
Kim, Hani Jieun
Salim, Agus
Speed, Terence P
Lin, David M
Yang, Pengyi
Yang, Jean Yee Hwa
scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_full scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_fullStr scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_full_unstemmed scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_short scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_sort scclassify: sample size estimation and multiscale classification of cells using single and multiple reference
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306901/
https://www.ncbi.nlm.nih.gov/pubmed/32567229
http://dx.doi.org/10.15252/msb.20199389
work_keys_str_mv AT linyingxin scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT caoyue scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT kimhanijieun scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT salimagus scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT speedterencep scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT lindavidm scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT yangpengyi scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT yangjeanyeehwa scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference