Cargando…

A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing

Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently unde...

Descripción completa

Detalles Bibliográficos
Autores principales: Aevermann, Brian, Zhang, Yun, Novotny, Mark, Keshk, Mohamed, Bakken, Trygve, Miller, Jeremy, Hodge, Rebecca, Lelieveldt, Boudewijn, Lein, Ed, Scheuermann, Richard H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494219/
https://www.ncbi.nlm.nih.gov/pubmed/34088715
http://dx.doi.org/10.1101/gr.275569.121
_version_ 1784579264634421248
author Aevermann, Brian
Zhang, Yun
Novotny, Mark
Keshk, Mohamed
Bakken, Trygve
Miller, Jeremy
Hodge, Rebecca
Lelieveldt, Boudewijn
Lein, Ed
Scheuermann, Richard H.
author_facet Aevermann, Brian
Zhang, Yun
Novotny, Mark
Keshk, Mohamed
Bakken, Trygve
Miller, Jeremy
Hodge, Rebecca
Lelieveldt, Boudewijn
Lein, Ed
Scheuermann, Richard H.
author_sort Aevermann, Brian
collection PubMed
description Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently underway. As a result, it is critical that the transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell types by surface protein expression to defining diseases by their molecular drivers. Here, we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the nonlinear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that optimally capture the cell type identity represented in complete scRNA-seq transcriptional profiles. The marker genes selected provide an expression barcode that serves as both a useful tool for downstream biological investigation and the necessary and sufficient characteristics for semantic cell type definition. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and noncoding RNAs in neuronal cell type identity.
format Online
Article
Text
id pubmed-8494219
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-84942192021-10-07 A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing Aevermann, Brian Zhang, Yun Novotny, Mark Keshk, Mohamed Bakken, Trygve Miller, Jeremy Hodge, Rebecca Lelieveldt, Boudewijn Lein, Ed Scheuermann, Richard H. Genome Res Method Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently underway. As a result, it is critical that the transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell types by surface protein expression to defining diseases by their molecular drivers. Here, we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the nonlinear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that optimally capture the cell type identity represented in complete scRNA-seq transcriptional profiles. The marker genes selected provide an expression barcode that serves as both a useful tool for downstream biological investigation and the necessary and sufficient characteristics for semantic cell type definition. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and noncoding RNAs in neuronal cell type identity. Cold Spring Harbor Laboratory Press 2021-10 /pmc/articles/PMC8494219/ /pubmed/34088715 http://dx.doi.org/10.1101/gr.275569.121 Text en © 2021 Aevermann et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Method
Aevermann, Brian
Zhang, Yun
Novotny, Mark
Keshk, Mohamed
Bakken, Trygve
Miller, Jeremy
Hodge, Rebecca
Lelieveldt, Boudewijn
Lein, Ed
Scheuermann, Richard H.
A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
title A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
title_full A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
title_fullStr A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
title_full_unstemmed A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
title_short A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
title_sort machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell rna sequencing
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494219/
https://www.ncbi.nlm.nih.gov/pubmed/34088715
http://dx.doi.org/10.1101/gr.275569.121
work_keys_str_mv AT aevermannbrian amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT zhangyun amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT novotnymark amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT keshkmohamed amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT bakkentrygve amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT millerjeremy amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT hodgerebecca amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT lelieveldtboudewijn amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT leined amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT scheuermannrichardh amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT aevermannbrian machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT zhangyun machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT novotnymark machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT keshkmohamed machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT bakkentrygve machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT millerjeremy machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT hodgerebecca machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT lelieveldtboudewijn machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT leined machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing
AT scheuermannrichardh machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing