Cargando…
SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing
BACKGROUND: Single-cell RNA-sequencing is revolutionising the study of cellular and tissue-wide heterogeneity in a large number of biological scenarios, from highly tissue-specific studies of disease to human-wide cell atlases. A central task in single-cell RNA-sequencing analysis design is the calc...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9361618/ https://www.ncbi.nlm.nih.gov/pubmed/35941549 http://dx.doi.org/10.1186/s12859-022-04860-2 |
_version_ | 1784764565721972736 |
---|---|
author | Nelson, M. E. Riva, S. G. Cvejic, A. |
author_facet | Nelson, M. E. Riva, S. G. Cvejic, A. |
author_sort | Nelson, M. E. |
collection | PubMed |
description | BACKGROUND: Single-cell RNA-sequencing is revolutionising the study of cellular and tissue-wide heterogeneity in a large number of biological scenarios, from highly tissue-specific studies of disease to human-wide cell atlases. A central task in single-cell RNA-sequencing analysis design is the calculation of cell type-specific genes in order to study the differential impact of different replicates (e.g. tumour vs. non-tumour environment) on the regulation of those genes and their associated networks. The crucial task is the efficient and reliable calculation of such cell type-specific ‘marker’ genes. These optimise the ability of the experiment to isolate highly-specific cell phenotypes of interest to the analyser. However, while methods exist that can calculate marker genes from single-cell RNA-sequencing, no such method places emphasise on specific cell phenotypes for downstream study in e.g. differential gene expression or other experimental protocols (spatial transcriptomics protocols for example). Here we present SMaSH, a general computational framework for extracting key marker genes from single-cell RNA-sequencing data which reliably characterise highly-specific and niche populations of cells in numerous different biological data-sets. RESULTS: SMaSH extracts robust and biologically well-motivated marker genes, which characterise a given single-cell RNA-sequencing data-set better than existing computational approaches for general marker gene calculation. We demonstrate the utility of SMaSH through its substantial performance improvement over several existing methods in the field. Furthermore, we evaluate the SMaSH markers on spatial transcriptomics data, demonstrating they identify highly localised compartments of the mouse cortex. CONCLUSION: SMaSH is a new methodology for calculating robust markers genes from large single-cell RNA-sequencing data-sets, and has implications for e.g. effective gene identification for probe design in downstream analyses spatial transcriptomics experiments. SMaSH has been fully-integrated with the ScanPy framework and provides a valuable bioinformatics tool for cell type characterisation and validation in every-growing data-sets spanning over 50 different cell types across hundreds of thousands of cells. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04860-2. |
format | Online Article Text |
id | pubmed-9361618 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-93616182022-08-10 SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing Nelson, M. E. Riva, S. G. Cvejic, A. BMC Bioinformatics Research BACKGROUND: Single-cell RNA-sequencing is revolutionising the study of cellular and tissue-wide heterogeneity in a large number of biological scenarios, from highly tissue-specific studies of disease to human-wide cell atlases. A central task in single-cell RNA-sequencing analysis design is the calculation of cell type-specific genes in order to study the differential impact of different replicates (e.g. tumour vs. non-tumour environment) on the regulation of those genes and their associated networks. The crucial task is the efficient and reliable calculation of such cell type-specific ‘marker’ genes. These optimise the ability of the experiment to isolate highly-specific cell phenotypes of interest to the analyser. However, while methods exist that can calculate marker genes from single-cell RNA-sequencing, no such method places emphasise on specific cell phenotypes for downstream study in e.g. differential gene expression or other experimental protocols (spatial transcriptomics protocols for example). Here we present SMaSH, a general computational framework for extracting key marker genes from single-cell RNA-sequencing data which reliably characterise highly-specific and niche populations of cells in numerous different biological data-sets. RESULTS: SMaSH extracts robust and biologically well-motivated marker genes, which characterise a given single-cell RNA-sequencing data-set better than existing computational approaches for general marker gene calculation. We demonstrate the utility of SMaSH through its substantial performance improvement over several existing methods in the field. Furthermore, we evaluate the SMaSH markers on spatial transcriptomics data, demonstrating they identify highly localised compartments of the mouse cortex. CONCLUSION: SMaSH is a new methodology for calculating robust markers genes from large single-cell RNA-sequencing data-sets, and has implications for e.g. effective gene identification for probe design in downstream analyses spatial transcriptomics experiments. SMaSH has been fully-integrated with the ScanPy framework and provides a valuable bioinformatics tool for cell type characterisation and validation in every-growing data-sets spanning over 50 different cell types across hundreds of thousands of cells. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04860-2. BioMed Central 2022-08-08 /pmc/articles/PMC9361618/ /pubmed/35941549 http://dx.doi.org/10.1186/s12859-022-04860-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Nelson, M. E. Riva, S. G. Cvejic, A. SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing |
title | SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing |
title_full | SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing |
title_fullStr | SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing |
title_full_unstemmed | SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing |
title_short | SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing |
title_sort | smash: a scalable, general marker gene identification framework for single-cell rna-sequencing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9361618/ https://www.ncbi.nlm.nih.gov/pubmed/35941549 http://dx.doi.org/10.1186/s12859-022-04860-2 |
work_keys_str_mv | AT nelsonme smashascalablegeneralmarkergeneidentificationframeworkforsinglecellrnasequencing AT rivasg smashascalablegeneralmarkergeneidentificationframeworkforsinglecellrnasequencing AT cvejica smashascalablegeneralmarkergeneidentificationframeworkforsinglecellrnasequencing |