Cargando…

Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets

BACKGROUND: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to hig...

Descripción completa

Detalles Bibliográficos
Autores principales: Aybey, Bogac, Zhao, Sheng, Brors, Benedikt, Staub, Eike
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441575/
https://www.ncbi.nlm.nih.gov/pubmed/37609075
http://dx.doi.org/10.3389/fimmu.2023.1194745
_version_ 1785093404504358912
author Aybey, Bogac
Zhao, Sheng
Brors, Benedikt
Staub, Eike
author_facet Aybey, Bogac
Zhao, Sheng
Brors, Benedikt
Staub, Eike
author_sort Aybey, Bogac
collection PubMed
description BACKGROUND: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. RESULTS: We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. DISCUSSION AND CONCLUSION: We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.
format Online
Article
Text
id pubmed-10441575
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-104415752023-08-22 Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets Aybey, Bogac Zhao, Sheng Brors, Benedikt Staub, Eike Front Immunol Immunology BACKGROUND: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. RESULTS: We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. DISCUSSION AND CONCLUSION: We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms. Frontiers Media S.A. 2023-08-04 /pmc/articles/PMC10441575/ /pubmed/37609075 http://dx.doi.org/10.3389/fimmu.2023.1194745 Text en Copyright © 2023 Aybey, Zhao, Brors and Staub https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Aybey, Bogac
Zhao, Sheng
Brors, Benedikt
Staub, Eike
Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
title Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
title_full Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
title_fullStr Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
title_full_unstemmed Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
title_short Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
title_sort immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441575/
https://www.ncbi.nlm.nih.gov/pubmed/37609075
http://dx.doi.org/10.3389/fimmu.2023.1194745
work_keys_str_mv AT aybeybogac immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets
AT zhaosheng immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets
AT brorsbenedikt immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets
AT staubeike immunecelltypesignaturediscoveryandrandomforestclassificationforanalysisofsinglecellgeneexpressiondatasets