Cargando…

MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering

The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well disting...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Chanwoo, Lee, Hanbin, Jeong, Juhee, Jung, Keehoon, Han, Buhm
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9262626/
https://www.ncbi.nlm.nih.gov/pubmed/35420135
http://dx.doi.org/10.1093/nar/gkac216
_version_ 1784742543494217728
author Kim, Chanwoo
Lee, Hanbin
Jeong, Juhee
Jung, Keehoon
Han, Buhm
author_facet Kim, Chanwoo
Lee, Hanbin
Jeong, Juhee
Jung, Keehoon
Han, Buhm
author_sort Kim, Chanwoo
collection PubMed
description The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file.
format Online
Article
Text
id pubmed-9262626
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92626262022-07-08 MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Kim, Chanwoo Lee, Hanbin Jeong, Juhee Jung, Keehoon Han, Buhm Nucleic Acids Res Methods Online The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file. Oxford University Press 2022-04-14 /pmc/articles/PMC9262626/ /pubmed/35420135 http://dx.doi.org/10.1093/nar/gkac216 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Kim, Chanwoo
Lee, Hanbin
Jeong, Juhee
Jung, Keehoon
Han, Buhm
MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
title MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
title_full MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
title_fullStr MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
title_full_unstemmed MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
title_short MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
title_sort marcopolo: a method to discover differentially expressed genes in single-cell rna-seq data without depending on prior clustering
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9262626/
https://www.ncbi.nlm.nih.gov/pubmed/35420135
http://dx.doi.org/10.1093/nar/gkac216
work_keys_str_mv AT kimchanwoo marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering
AT leehanbin marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering
AT jeongjuhee marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering
AT jungkeehoon marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering
AT hanbuhm marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering