Cargando…

Machine learning for pan-cancer classification based on RNA sequencing data

Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose...

Descripción completa

Detalles Bibliográficos
Autores principales: Štancl, Paula, Karlić, Rosa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10667476/
https://www.ncbi.nlm.nih.gov/pubmed/38028533
http://dx.doi.org/10.3389/fmolb.2023.1285795
_version_ 1785139258448674816
author Štancl, Paula
Karlić, Rosa
author_facet Štancl, Paula
Karlić, Rosa
author_sort Štancl, Paula
collection PubMed
description Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose a significant clinical challenge with limited treatment options. Data produced by large-scale cancer genomics initiatives, which aim to determine the genomic, epigenomic, and transcriptomic characteristics of a large number of individual patients of multiple cancer types, have led to the introduction of various methods that use machine learning to predict the TOO of cancer patients. In this review, we assess the reproducibility, interpretability, and robustness of results obtained by 20 recent studies that utilize different machine learning methods for TOO prediction based on RNA sequencing data, including their reported performance on independent data sets and identification of important features. Our review investigates the strengths and weaknesses of different methods, checks the correspondence of their results, and identifies potential issues with datasets used for model training and testing, assessing their potential usefulness in a clinical setting and suggesting future improvements.
format Online
Article
Text
id pubmed-10667476
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106674762023-01-01 Machine learning for pan-cancer classification based on RNA sequencing data Štancl, Paula Karlić, Rosa Front Mol Biosci Molecular Biosciences Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose a significant clinical challenge with limited treatment options. Data produced by large-scale cancer genomics initiatives, which aim to determine the genomic, epigenomic, and transcriptomic characteristics of a large number of individual patients of multiple cancer types, have led to the introduction of various methods that use machine learning to predict the TOO of cancer patients. In this review, we assess the reproducibility, interpretability, and robustness of results obtained by 20 recent studies that utilize different machine learning methods for TOO prediction based on RNA sequencing data, including their reported performance on independent data sets and identification of important features. Our review investigates the strengths and weaknesses of different methods, checks the correspondence of their results, and identifies potential issues with datasets used for model training and testing, assessing their potential usefulness in a clinical setting and suggesting future improvements. Frontiers Media S.A. 2023-11-10 /pmc/articles/PMC10667476/ /pubmed/38028533 http://dx.doi.org/10.3389/fmolb.2023.1285795 Text en Copyright © 2023 Štancl and Karlić. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Štancl, Paula
Karlić, Rosa
Machine learning for pan-cancer classification based on RNA sequencing data
title Machine learning for pan-cancer classification based on RNA sequencing data
title_full Machine learning for pan-cancer classification based on RNA sequencing data
title_fullStr Machine learning for pan-cancer classification based on RNA sequencing data
title_full_unstemmed Machine learning for pan-cancer classification based on RNA sequencing data
title_short Machine learning for pan-cancer classification based on RNA sequencing data
title_sort machine learning for pan-cancer classification based on rna sequencing data
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10667476/
https://www.ncbi.nlm.nih.gov/pubmed/38028533
http://dx.doi.org/10.3389/fmolb.2023.1285795
work_keys_str_mv AT stanclpaula machinelearningforpancancerclassificationbasedonrnasequencingdata
AT karlicrosa machinelearningforpancancerclassificationbasedonrnasequencingdata