Cargando…

LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes

The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qiaoming, Liang, Yingjian, Wang, Dong, Li, Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754124/
https://www.ncbi.nlm.nih.gov/pubmed/36531230
http://dx.doi.org/10.3389/fgene.2022.1068075
_version_ 1784851117659652096
author Liu, Qiaoming
Liang, Yingjian
Wang, Dong
Li, Jie
author_facet Liu, Qiaoming
Liang, Yingjian
Wang, Dong
Li, Jie
author_sort Liu, Qiaoming
collection PubMed
description The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single-cell transcriptomes. An anchor graph is constructed to depict the relationship between reference samples and cells. By applying a connectivity constraint to the learned graph, LFSC enables the preservation of the underlying cluster structure. Moreover, the overall complexity of LFSC is linear to the size of the data, which greatly improves effectiveness and efficiency. By applying LFSC to real single-cell RNA sequencing datasets, we discovered that it has superior performance over existing baseline methods in clustering accuracy and robustness. An application using infiltrating T cells in liver cancer demonstrates that LFSC can successfully find new cell types, discover differently expressed genes, and explore new cancer-associated biomarkers.
format Online
Article
Text
id pubmed-9754124
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97541242022-12-16 LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes Liu, Qiaoming Liang, Yingjian Wang, Dong Li, Jie Front Genet Genetics The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single-cell transcriptomes. An anchor graph is constructed to depict the relationship between reference samples and cells. By applying a connectivity constraint to the learned graph, LFSC enables the preservation of the underlying cluster structure. Moreover, the overall complexity of LFSC is linear to the size of the data, which greatly improves effectiveness and efficiency. By applying LFSC to real single-cell RNA sequencing datasets, we discovered that it has superior performance over existing baseline methods in clustering accuracy and robustness. An application using infiltrating T cells in liver cancer demonstrates that LFSC can successfully find new cell types, discover differently expressed genes, and explore new cancer-associated biomarkers. Frontiers Media S.A. 2022-12-01 /pmc/articles/PMC9754124/ /pubmed/36531230 http://dx.doi.org/10.3389/fgene.2022.1068075 Text en Copyright © 2022 Liu, Liang, Wang and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Liu, Qiaoming
Liang, Yingjian
Wang, Dong
Li, Jie
LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
title LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
title_full LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
title_fullStr LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
title_full_unstemmed LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
title_short LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
title_sort lfsc: a linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754124/
https://www.ncbi.nlm.nih.gov/pubmed/36531230
http://dx.doi.org/10.3389/fgene.2022.1068075
work_keys_str_mv AT liuqiaoming lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes
AT liangyingjian lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes
AT wangdong lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes
AT lijie lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes