Cargando…
Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
BACKGROUND: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to hig...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441575/ https://www.ncbi.nlm.nih.gov/pubmed/37609075 http://dx.doi.org/10.3389/fimmu.2023.1194745 |
_version_ | 1785093404504358912 |
---|---|
author | Aybey, Bogac Zhao, Sheng Brors, Benedikt Staub, Eike |
author_facet | Aybey, Bogac Zhao, Sheng Brors, Benedikt Staub, Eike |
author_sort | Aybey, Bogac |
collection | PubMed |
description | BACKGROUND: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. RESULTS: We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. DISCUSSION AND CONCLUSION: We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms. |
format | Online Article Text |
id | pubmed-10441575 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-104415752023-08-22 Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets Aybey, Bogac Zhao, Sheng Brors, Benedikt Staub, Eike Front Immunol Immunology BACKGROUND: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. RESULTS: We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. DISCUSSION AND CONCLUSION: We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms. Frontiers Media S.A. 2023-08-04 /pmc/articles/PMC10441575/ /pubmed/37609075 http://dx.doi.org/10.3389/fimmu.2023.1194745 Text en Copyright © 2023 Aybey, Zhao, Brors and Staub https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Immunology Aybey, Bogac Zhao, Sheng Brors, Benedikt Staub, Eike Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
title | Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
title_full | Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
title_fullStr | Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
title_full_unstemmed | Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
title_short | Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
title_sort | immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets |
topic | Immunology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441575/ https://www.ncbi.nlm.nih.gov/pubmed/37609075 http://dx.doi.org/10.3389/fimmu.2023.1194745 |
work_keys_str_mv | AT aybeybogac immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets AT zhaosheng immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets AT brorsbenedikt immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets AT staubeike immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets |