Cargando…

CellMeSH: probabilistic cell-type identification using indexed literature

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcrip...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Shunfu, Zhang, Yue, Seelig, Georg, Kannan, Sreeram
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826164/
https://www.ncbi.nlm.nih.gov/pubmed/34893819
http://dx.doi.org/10.1093/bioinformatics/btab834
_version_ 1784647375476752384
author Mao, Shunfu
Zhang, Yue
Seelig, Georg
Kannan, Sreeram
author_facet Mao, Shunfu
Zhang, Yue
Seelig, Georg
Kannan, Sreeram
author_sort Mao, Shunfu
collection PubMed
description MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH—a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene–cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene–cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8826164
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88261642022-02-09 CellMeSH: probabilistic cell-type identification using indexed literature Mao, Shunfu Zhang, Yue Seelig, Georg Kannan, Sreeram Bioinformatics Original Papers MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH—a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene–cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene–cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-12-10 /pmc/articles/PMC8826164/ /pubmed/34893819 http://dx.doi.org/10.1093/bioinformatics/btab834 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Mao, Shunfu
Zhang, Yue
Seelig, Georg
Kannan, Sreeram
CellMeSH: probabilistic cell-type identification using indexed literature
title CellMeSH: probabilistic cell-type identification using indexed literature
title_full CellMeSH: probabilistic cell-type identification using indexed literature
title_fullStr CellMeSH: probabilistic cell-type identification using indexed literature
title_full_unstemmed CellMeSH: probabilistic cell-type identification using indexed literature
title_short CellMeSH: probabilistic cell-type identification using indexed literature
title_sort cellmesh: probabilistic cell-type identification using indexed literature
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826164/
https://www.ncbi.nlm.nih.gov/pubmed/34893819
http://dx.doi.org/10.1093/bioinformatics/btab834
work_keys_str_mv AT maoshunfu cellmeshprobabilisticcelltypeidentificationusingindexedliterature
AT zhangyue cellmeshprobabilisticcelltypeidentificationusingindexedliterature
AT seeliggeorg cellmeshprobabilisticcelltypeidentificationusingindexedliterature
AT kannansreeram cellmeshprobabilisticcelltypeidentificationusingindexedliterature