Cargando…

PreCanCell: An ensemble learning algorithm for predicting cancer and non-cancer cells from single-cell transcriptomes

We propose PreCanCell, a novel algorithm for predicting malignant and non-malignant cells from single-cell transcriptomes. PreCanCell first identifies the differentially expressed genes (DEGs) between malignant and non-malignant cells commonly in five common cancer types-associated single-cell trans...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Tao, Yan, Qiyu, Long, Rongzhuo, Liu, Zhixian, Wang, Xiaosheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10371765/
https://www.ncbi.nlm.nih.gov/pubmed/37501705
http://dx.doi.org/10.1016/j.csbj.2023.07.009
Descripción
Sumario:We propose PreCanCell, a novel algorithm for predicting malignant and non-malignant cells from single-cell transcriptomes. PreCanCell first identifies the differentially expressed genes (DEGs) between malignant and non-malignant cells commonly in five common cancer types-associated single-cell transcriptome datasets. The five common cancer types include renal cell carcinoma (RCC), head and neck squamous cell carcinoma (HNSCC), melanoma, lung adenocarcinoma (LUAD), and breast cancer (BC). With each of the five datasets as the training set and the DEGs as the features, a single cell is classified as malignant or non-malignant by k-NN (k = 5). Finally, the single cell is determined as malignant or non-malignant by the majority vote of the five k-NN classification results. We tested the predictive performance of PreCanCell in 19 single-cell datasets, and reported classification accuracy, sensitivity, specificity, balanced accuracy (the average of sensitivity and specificity) and the area under the receiver operating characteristic curve (AUROC). In all these datasets, PreCanCell achieved above 0.8 accuracy, sensitivity, specificity, balanced accuracy and AUROC. Finally, we compared the predictive performance of PreCanCell with that of seven other algorithms, including CHETAH, SciBet, SCINA, scmap-cell, scmap-cluster, SingleR, and ikarus. Compared to these algorithms, PreCanCell displays the advantages of higher accuracy and simpler implementation. We have developed an R package for the PreCanCell algorithm, which is available at https://github.com/WangX-Lab/PreCanCell.