Cargando…
MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering
The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well disting...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9262626/ https://www.ncbi.nlm.nih.gov/pubmed/35420135 http://dx.doi.org/10.1093/nar/gkac216 |
_version_ | 1784742543494217728 |
---|---|
author | Kim, Chanwoo Lee, Hanbin Jeong, Juhee Jung, Keehoon Han, Buhm |
author_facet | Kim, Chanwoo Lee, Hanbin Jeong, Juhee Jung, Keehoon Han, Buhm |
author_sort | Kim, Chanwoo |
collection | PubMed |
description | The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file. |
format | Online Article Text |
id | pubmed-9262626 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92626262022-07-08 MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Kim, Chanwoo Lee, Hanbin Jeong, Juhee Jung, Keehoon Han, Buhm Nucleic Acids Res Methods Online The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file. Oxford University Press 2022-04-14 /pmc/articles/PMC9262626/ /pubmed/35420135 http://dx.doi.org/10.1093/nar/gkac216 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Kim, Chanwoo Lee, Hanbin Jeong, Juhee Jung, Keehoon Han, Buhm MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering |
title | MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering |
title_full | MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering |
title_fullStr | MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering |
title_full_unstemmed | MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering |
title_short | MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering |
title_sort | marcopolo: a method to discover differentially expressed genes in single-cell rna-seq data without depending on prior clustering |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9262626/ https://www.ncbi.nlm.nih.gov/pubmed/35420135 http://dx.doi.org/10.1093/nar/gkac216 |
work_keys_str_mv | AT kimchanwoo marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering AT leehanbin marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering AT jeongjuhee marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering AT jungkeehoon marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering AT hanbuhm marcopoloamethodtodiscoverdifferentiallyexpressedgenesinsinglecellrnaseqdatawithoutdependingonpriorclustering |