Cargando…

Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods

An essential step in the analysis of single‐cell RNA sequencing data is to classify cells into specific cell types using marker genes. In this study, we have developed a machine learning pipeline called single‐cell predictive marker (SPmarker) to identify novel cell‐type marker genes in the Arabidop...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Haidong, Lee, Jiyoung, Song, Qi, Li, Qi, Schiefelbein, John, Zhao, Bingyu, Li, Song
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9314150/
https://www.ncbi.nlm.nih.gov/pubmed/35211979
http://dx.doi.org/10.1111/nph.18053
_version_ 1784754250226597888
author Yan, Haidong
Lee, Jiyoung
Song, Qi
Li, Qi
Schiefelbein, John
Zhao, Bingyu
Li, Song
author_facet Yan, Haidong
Lee, Jiyoung
Song, Qi
Li, Qi
Schiefelbein, John
Zhao, Bingyu
Li, Song
author_sort Yan, Haidong
collection PubMed
description An essential step in the analysis of single‐cell RNA sequencing data is to classify cells into specific cell types using marker genes. In this study, we have developed a machine learning pipeline called single‐cell predictive marker (SPmarker) to identify novel cell‐type marker genes in the Arabidopsis root. Unlike traditional approaches, our method uses interpretable machine learning models to select marker genes. We have demonstrated that our method can: assign cell types based on cells that were labelled using published methods; project cell types identified by trajectory analysis from one data set to other data sets; and assign cell types based on internal GFP markers. Using SPmarker, we have identified hundreds of new marker genes that were not identified before. As compared to known marker genes, the new marker genes have more orthologous genes identifiable in the corresponding rice single‐cell clusters. The new root hair marker genes also include 172 genes with orthologs expressed in root hair cells in five non‐Arabidopsis species, which expands the number of marker genes for this cell type by 35–154%. Our results represent a new approach to identifying cell‐type marker genes from scRNA‐seq data and pave the way for cross‐species mapping of scRNA‐seq data in plants.
format Online
Article
Text
id pubmed-9314150
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-93141502022-07-30 Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods Yan, Haidong Lee, Jiyoung Song, Qi Li, Qi Schiefelbein, John Zhao, Bingyu Li, Song New Phytol Research An essential step in the analysis of single‐cell RNA sequencing data is to classify cells into specific cell types using marker genes. In this study, we have developed a machine learning pipeline called single‐cell predictive marker (SPmarker) to identify novel cell‐type marker genes in the Arabidopsis root. Unlike traditional approaches, our method uses interpretable machine learning models to select marker genes. We have demonstrated that our method can: assign cell types based on cells that were labelled using published methods; project cell types identified by trajectory analysis from one data set to other data sets; and assign cell types based on internal GFP markers. Using SPmarker, we have identified hundreds of new marker genes that were not identified before. As compared to known marker genes, the new marker genes have more orthologous genes identifiable in the corresponding rice single‐cell clusters. The new root hair marker genes also include 172 genes with orthologs expressed in root hair cells in five non‐Arabidopsis species, which expands the number of marker genes for this cell type by 35–154%. Our results represent a new approach to identifying cell‐type marker genes from scRNA‐seq data and pave the way for cross‐species mapping of scRNA‐seq data in plants. John Wiley and Sons Inc. 2022-03-26 2022-05 /pmc/articles/PMC9314150/ /pubmed/35211979 http://dx.doi.org/10.1111/nph.18053 Text en © 2022 The Authors. New Phytologist © 2022 New Phytologist Foundation https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Research
Yan, Haidong
Lee, Jiyoung
Song, Qi
Li, Qi
Schiefelbein, John
Zhao, Bingyu
Li, Song
Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods
title Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods
title_full Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods
title_fullStr Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods
title_full_unstemmed Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods
title_short Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable machine learning methods
title_sort identification of new marker genes from plant single‐cell rna‐seq data using interpretable machine learning methods
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9314150/
https://www.ncbi.nlm.nih.gov/pubmed/35211979
http://dx.doi.org/10.1111/nph.18053
work_keys_str_mv AT yanhaidong identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods
AT leejiyoung identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods
AT songqi identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods
AT liqi identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods
AT schiefelbeinjohn identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods
AT zhaobingyu identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods
AT lisong identificationofnewmarkergenesfromplantsinglecellrnaseqdatausinginterpretablemachinelearningmethods