Cargando…
LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes
The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754124/ https://www.ncbi.nlm.nih.gov/pubmed/36531230 http://dx.doi.org/10.3389/fgene.2022.1068075 |
_version_ | 1784851117659652096 |
---|---|
author | Liu, Qiaoming Liang, Yingjian Wang, Dong Li, Jie |
author_facet | Liu, Qiaoming Liang, Yingjian Wang, Dong Li, Jie |
author_sort | Liu, Qiaoming |
collection | PubMed |
description | The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single-cell transcriptomes. An anchor graph is constructed to depict the relationship between reference samples and cells. By applying a connectivity constraint to the learned graph, LFSC enables the preservation of the underlying cluster structure. Moreover, the overall complexity of LFSC is linear to the size of the data, which greatly improves effectiveness and efficiency. By applying LFSC to real single-cell RNA sequencing datasets, we discovered that it has superior performance over existing baseline methods in clustering accuracy and robustness. An application using infiltrating T cells in liver cancer demonstrates that LFSC can successfully find new cell types, discover differently expressed genes, and explore new cancer-associated biomarkers. |
format | Online Article Text |
id | pubmed-9754124 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-97541242022-12-16 LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes Liu, Qiaoming Liang, Yingjian Wang, Dong Li, Jie Front Genet Genetics The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single-cell transcriptomes. An anchor graph is constructed to depict the relationship between reference samples and cells. By applying a connectivity constraint to the learned graph, LFSC enables the preservation of the underlying cluster structure. Moreover, the overall complexity of LFSC is linear to the size of the data, which greatly improves effectiveness and efficiency. By applying LFSC to real single-cell RNA sequencing datasets, we discovered that it has superior performance over existing baseline methods in clustering accuracy and robustness. An application using infiltrating T cells in liver cancer demonstrates that LFSC can successfully find new cell types, discover differently expressed genes, and explore new cancer-associated biomarkers. Frontiers Media S.A. 2022-12-01 /pmc/articles/PMC9754124/ /pubmed/36531230 http://dx.doi.org/10.3389/fgene.2022.1068075 Text en Copyright © 2022 Liu, Liang, Wang and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Liu, Qiaoming Liang, Yingjian Wang, Dong Li, Jie LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
title | LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
title_full | LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
title_fullStr | LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
title_full_unstemmed | LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
title_short | LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
title_sort | lfsc: a linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754124/ https://www.ncbi.nlm.nih.gov/pubmed/36531230 http://dx.doi.org/10.3389/fgene.2022.1068075 |
work_keys_str_mv | AT liuqiaoming lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes AT liangyingjian lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes AT wangdong lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes AT lijie lfscalinearfastsemisupervisedclusteringalgorithmthatintegratesreferencebulkandsinglecelltranscriptomes |