Cargando…
CellMeSH: probabilistic cell-type identification using indexed literature
MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcrip...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826164/ https://www.ncbi.nlm.nih.gov/pubmed/34893819 http://dx.doi.org/10.1093/bioinformatics/btab834 |
_version_ | 1784647375476752384 |
---|---|
author | Mao, Shunfu Zhang, Yue Seelig, Georg Kannan, Sreeram |
author_facet | Mao, Shunfu Zhang, Yue Seelig, Georg Kannan, Sreeram |
author_sort | Mao, Shunfu |
collection | PubMed |
description | MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH—a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene–cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene–cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8826164 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-88261642022-02-09 CellMeSH: probabilistic cell-type identification using indexed literature Mao, Shunfu Zhang, Yue Seelig, Georg Kannan, Sreeram Bioinformatics Original Papers MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. RESULTS: Here, we introduce CellMeSH—a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene–cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene–cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. AVAILABILITY AND IMPLEMENTATION: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-12-10 /pmc/articles/PMC8826164/ /pubmed/34893819 http://dx.doi.org/10.1093/bioinformatics/btab834 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Mao, Shunfu Zhang, Yue Seelig, Georg Kannan, Sreeram CellMeSH: probabilistic cell-type identification using indexed literature |
title | CellMeSH: probabilistic cell-type identification using indexed literature |
title_full | CellMeSH: probabilistic cell-type identification using indexed literature |
title_fullStr | CellMeSH: probabilistic cell-type identification using indexed literature |
title_full_unstemmed | CellMeSH: probabilistic cell-type identification using indexed literature |
title_short | CellMeSH: probabilistic cell-type identification using indexed literature |
title_sort | cellmesh: probabilistic cell-type identification using indexed literature |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826164/ https://www.ncbi.nlm.nih.gov/pubmed/34893819 http://dx.doi.org/10.1093/bioinformatics/btab834 |
work_keys_str_mv | AT maoshunfu cellmeshprobabilisticcelltypeidentificationusingindexedliterature AT zhangyue cellmeshprobabilisticcelltypeidentificationusingindexedliterature AT seeliggeorg cellmeshprobabilisticcelltypeidentificationusingindexedliterature AT kannansreeram cellmeshprobabilisticcelltypeidentificationusingindexedliterature |