Cargando…
A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing
Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently unde...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494219/ https://www.ncbi.nlm.nih.gov/pubmed/34088715 http://dx.doi.org/10.1101/gr.275569.121 |
_version_ | 1784579264634421248 |
---|---|
author | Aevermann, Brian Zhang, Yun Novotny, Mark Keshk, Mohamed Bakken, Trygve Miller, Jeremy Hodge, Rebecca Lelieveldt, Boudewijn Lein, Ed Scheuermann, Richard H. |
author_facet | Aevermann, Brian Zhang, Yun Novotny, Mark Keshk, Mohamed Bakken, Trygve Miller, Jeremy Hodge, Rebecca Lelieveldt, Boudewijn Lein, Ed Scheuermann, Richard H. |
author_sort | Aevermann, Brian |
collection | PubMed |
description | Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently underway. As a result, it is critical that the transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell types by surface protein expression to defining diseases by their molecular drivers. Here, we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the nonlinear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that optimally capture the cell type identity represented in complete scRNA-seq transcriptional profiles. The marker genes selected provide an expression barcode that serves as both a useful tool for downstream biological investigation and the necessary and sufficient characteristics for semantic cell type definition. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and noncoding RNAs in neuronal cell type identity. |
format | Online Article Text |
id | pubmed-8494219 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84942192021-10-07 A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing Aevermann, Brian Zhang, Yun Novotny, Mark Keshk, Mohamed Bakken, Trygve Miller, Jeremy Hodge, Rebecca Lelieveldt, Boudewijn Lein, Ed Scheuermann, Richard H. Genome Res Method Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently underway. As a result, it is critical that the transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell types by surface protein expression to defining diseases by their molecular drivers. Here, we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the nonlinear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that optimally capture the cell type identity represented in complete scRNA-seq transcriptional profiles. The marker genes selected provide an expression barcode that serves as both a useful tool for downstream biological investigation and the necessary and sufficient characteristics for semantic cell type definition. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and noncoding RNAs in neuronal cell type identity. Cold Spring Harbor Laboratory Press 2021-10 /pmc/articles/PMC8494219/ /pubmed/34088715 http://dx.doi.org/10.1101/gr.275569.121 Text en © 2021 Aevermann et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) . |
spellingShingle | Method Aevermann, Brian Zhang, Yun Novotny, Mark Keshk, Mohamed Bakken, Trygve Miller, Jeremy Hodge, Rebecca Lelieveldt, Boudewijn Lein, Ed Scheuermann, Richard H. A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing |
title | A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing |
title_full | A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing |
title_fullStr | A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing |
title_full_unstemmed | A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing |
title_short | A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing |
title_sort | machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell rna sequencing |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494219/ https://www.ncbi.nlm.nih.gov/pubmed/34088715 http://dx.doi.org/10.1101/gr.275569.121 |
work_keys_str_mv | AT aevermannbrian amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT zhangyun amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT novotnymark amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT keshkmohamed amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT bakkentrygve amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT millerjeremy amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT hodgerebecca amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT lelieveldtboudewijn amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT leined amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT scheuermannrichardh amachinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT aevermannbrian machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT zhangyun machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT novotnymark machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT keshkmohamed machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT bakkentrygve machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT millerjeremy machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT hodgerebecca machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT lelieveldtboudewijn machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT leined machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing AT scheuermannrichardh machinelearningmethodforthediscoveryofminimummarkergenecombinationsforcelltypeidentificationfromsinglecellrnasequencing |