Cargando…

scMatch: a single-cell gene expression profile annotation tool using reference datasets

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cel...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Rui, Denisenko, Elena, Forrest, Alistair R R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853649/
https://www.ncbi.nlm.nih.gov/pubmed/31028376
http://dx.doi.org/10.1093/bioinformatics/btz292
_version_ 1783470075163967488
author Hou, Rui
Denisenko, Elena
Forrest, Alistair R R
author_facet Hou, Rui
Denisenko, Elena
Forrest, Alistair R R
author_sort Hou, Rui
collection PubMed
description MOTIVATION: Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process. RESULTS: We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision. AVAILABILITY AND IMPLEMENTATION: scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6853649
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68536492019-11-19 scMatch: a single-cell gene expression profile annotation tool using reference datasets Hou, Rui Denisenko, Elena Forrest, Alistair R R Bioinformatics Original Papers MOTIVATION: Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process. RESULTS: We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision. AVAILABILITY AND IMPLEMENTATION: scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-11-15 2019-04-26 /pmc/articles/PMC6853649/ /pubmed/31028376 http://dx.doi.org/10.1093/bioinformatics/btz292 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Hou, Rui
Denisenko, Elena
Forrest, Alistair R R
scMatch: a single-cell gene expression profile annotation tool using reference datasets
title scMatch: a single-cell gene expression profile annotation tool using reference datasets
title_full scMatch: a single-cell gene expression profile annotation tool using reference datasets
title_fullStr scMatch: a single-cell gene expression profile annotation tool using reference datasets
title_full_unstemmed scMatch: a single-cell gene expression profile annotation tool using reference datasets
title_short scMatch: a single-cell gene expression profile annotation tool using reference datasets
title_sort scmatch: a single-cell gene expression profile annotation tool using reference datasets
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853649/
https://www.ncbi.nlm.nih.gov/pubmed/31028376
http://dx.doi.org/10.1093/bioinformatics/btz292
work_keys_str_mv AT hourui scmatchasinglecellgeneexpressionprofileannotationtoolusingreferencedatasets
AT denisenkoelena scmatchasinglecellgeneexpressionprofileannotationtoolusingreferencedatasets
AT forrestalistairrr scmatchasinglecellgeneexpressionprofileannotationtoolusingreferencedatasets