Cargando…

Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?

There is a growing need to build a model that uses single cell RNA-seq (scRNA-seq) to separate malignant cells from nonmalignant cells and to identify tumor of origin of single cells and/or circulating tumor cells (CTCs). Currently, it is infeasible to build a tumor of origin model learnt from scRNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Hua-Ping, Wang, Dongwen, Lai, Hung-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9162953/
https://www.ncbi.nlm.nih.gov/pubmed/35685355
http://dx.doi.org/10.1016/j.csbj.2022.05.035
_version_ 1784719823470592000
author Liu, Hua-Ping
Wang, Dongwen
Lai, Hung-Ming
author_facet Liu, Hua-Ping
Wang, Dongwen
Lai, Hung-Ming
author_sort Liu, Hua-Ping
collection PubMed
description There is a growing need to build a model that uses single cell RNA-seq (scRNA-seq) to separate malignant cells from nonmalignant cells and to identify tumor of origin of single cells and/or circulating tumor cells (CTCs). Currently, it is infeasible to build a tumor of origin model learnt from scRNA-seq by machine learning (ML). We then wondered if an ML model learnt from bulk transcriptomes is applicable to scRNA-seq to infer single cells’ tumor presence and further indicate their tumor of origin. We used k-nearest neighbors, one-versus-all support vector machine, one-versus-one support vector machine, random forest and introduced scTumorTrace to conduct a pioneering experiment containing leukocytes and seven major cancer types where bulk RNA-seq and scRNA-seq data were available. 13 ML models learnt from bulk RNA-seq were all reliable to use (F-score > 96%) shown by a validation set of bulk transcriptomes, but none of them was applicable to scRNA-seq except scTumorTrace. Making inferences from bulk RNA-seq to scRNA-seq was impaired by feature selection and improved by log2-transformed TPM units. scTumorTrace with transcriptome-wide 2-tuples showed F-score beyond 98.74 and 94.29% in inferring tumor presence and tumor of origin at single-cell resolution and correctly identified 45 single candidate prostate CTCs but lineage-confirmed non-CTCs as leukocytes. We concluded that modern ML techniques are quantitative and could hardly address the raised questions. scTumorTrace with transcriptome-wide 2-tuples is qualitative, standardization-free and not subject to log2-transformed quantities, enabling us to infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes.
format Online
Article
Text
id pubmed-9162953
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-91629532022-06-08 Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning? Liu, Hua-Ping Wang, Dongwen Lai, Hung-Ming Comput Struct Biotechnol J Research Article There is a growing need to build a model that uses single cell RNA-seq (scRNA-seq) to separate malignant cells from nonmalignant cells and to identify tumor of origin of single cells and/or circulating tumor cells (CTCs). Currently, it is infeasible to build a tumor of origin model learnt from scRNA-seq by machine learning (ML). We then wondered if an ML model learnt from bulk transcriptomes is applicable to scRNA-seq to infer single cells’ tumor presence and further indicate their tumor of origin. We used k-nearest neighbors, one-versus-all support vector machine, one-versus-one support vector machine, random forest and introduced scTumorTrace to conduct a pioneering experiment containing leukocytes and seven major cancer types where bulk RNA-seq and scRNA-seq data were available. 13 ML models learnt from bulk RNA-seq were all reliable to use (F-score > 96%) shown by a validation set of bulk transcriptomes, but none of them was applicable to scRNA-seq except scTumorTrace. Making inferences from bulk RNA-seq to scRNA-seq was impaired by feature selection and improved by log2-transformed TPM units. scTumorTrace with transcriptome-wide 2-tuples showed F-score beyond 98.74 and 94.29% in inferring tumor presence and tumor of origin at single-cell resolution and correctly identified 45 single candidate prostate CTCs but lineage-confirmed non-CTCs as leukocytes. We concluded that modern ML techniques are quantitative and could hardly address the raised questions. scTumorTrace with transcriptome-wide 2-tuples is qualitative, standardization-free and not subject to log2-transformed quantities, enabling us to infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes. Research Network of Computational and Structural Biotechnology 2022-05-23 /pmc/articles/PMC9162953/ /pubmed/35685355 http://dx.doi.org/10.1016/j.csbj.2022.05.035 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Liu, Hua-Ping
Wang, Dongwen
Lai, Hung-Ming
Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
title Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
title_full Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
title_fullStr Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
title_full_unstemmed Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
title_short Can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
title_sort can we infer tumor presence of single cell transcriptomes and their tumor of origin from bulk transcriptomes by machine learning?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9162953/
https://www.ncbi.nlm.nih.gov/pubmed/35685355
http://dx.doi.org/10.1016/j.csbj.2022.05.035
work_keys_str_mv AT liuhuaping canweinfertumorpresenceofsinglecelltranscriptomesandtheirtumoroforiginfrombulktranscriptomesbymachinelearning
AT wangdongwen canweinfertumorpresenceofsinglecelltranscriptomesandtheirtumoroforiginfrombulktranscriptomesbymachinelearning
AT laihungming canweinfertumorpresenceofsinglecelltranscriptomesandtheirtumoroforiginfrombulktranscriptomesbymachinelearning