Cargando…
scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hier...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306901/ https://www.ncbi.nlm.nih.gov/pubmed/32567229 http://dx.doi.org/10.15252/msb.20199389 |
_version_ | 1783548740160716800 |
---|---|
author | Lin, Yingxin Cao, Yue Kim, Hani Jieun Salim, Agus Speed, Terence P Lin, David M Yang, Pengyi Yang, Jean Yee Hwa |
author_facet | Lin, Yingxin Cao, Yue Kim, Hani Jieun Salim, Agus Speed, Terence P Lin, David M Yang, Pengyi Yang, Jean Yee Hwa |
author_sort | Lin, Yingxin |
collection | PubMed |
description | Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. |
format | Online Article Text |
id | pubmed-7306901 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73069012020-06-22 scClassify: sample size estimation and multiscale classification of cells using single and multiple reference Lin, Yingxin Cao, Yue Kim, Hani Jieun Salim, Agus Speed, Terence P Lin, David M Yang, Pengyi Yang, Jean Yee Hwa Mol Syst Biol Methods Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. John Wiley and Sons Inc. 2020-06-22 /pmc/articles/PMC7306901/ /pubmed/32567229 http://dx.doi.org/10.15252/msb.20199389 Text en © 2020 The Authors. Published under the terms of the CC BY 4.0 license This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Lin, Yingxin Cao, Yue Kim, Hani Jieun Salim, Agus Speed, Terence P Lin, David M Yang, Pengyi Yang, Jean Yee Hwa scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
title | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
title_full | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
title_fullStr | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
title_full_unstemmed | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
title_short | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
title_sort | scclassify: sample size estimation and multiscale classification of cells using single and multiple reference |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306901/ https://www.ncbi.nlm.nih.gov/pubmed/32567229 http://dx.doi.org/10.15252/msb.20199389 |
work_keys_str_mv | AT linyingxin scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT caoyue scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT kimhanijieun scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT salimagus scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT speedterencep scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT lindavidm scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT yangpengyi scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT yangjeanyeehwa scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference |