Cargando…

Pan-cancer classification by regularized multi-task learning

Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across...

Descripción completa

Detalles Bibliográficos
Autores principales: Hossain, Sk Md Mosaddek, Khatun, Lutfunnesa, Ray, Sumanta, Mukhopadhyay, Anirban
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8688544/
https://www.ncbi.nlm.nih.gov/pubmed/34930937
http://dx.doi.org/10.1038/s41598-021-03554-8
_version_ 1784618375860715520
author Hossain, Sk Md Mosaddek
Khatun, Lutfunnesa
Ray, Sumanta
Mukhopadhyay, Anirban
author_facet Hossain, Sk Md Mosaddek
Khatun, Lutfunnesa
Ray, Sumanta
Mukhopadhyay, Anirban
author_sort Hossain, Sk Md Mosaddek
collection PubMed
description Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples.
format Online
Article
Text
id pubmed-8688544
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-86885442021-12-22 Pan-cancer classification by regularized multi-task learning Hossain, Sk Md Mosaddek Khatun, Lutfunnesa Ray, Sumanta Mukhopadhyay, Anirban Sci Rep Article Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples. Nature Publishing Group UK 2021-12-20 /pmc/articles/PMC8688544/ /pubmed/34930937 http://dx.doi.org/10.1038/s41598-021-03554-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Hossain, Sk Md Mosaddek
Khatun, Lutfunnesa
Ray, Sumanta
Mukhopadhyay, Anirban
Pan-cancer classification by regularized multi-task learning
title Pan-cancer classification by regularized multi-task learning
title_full Pan-cancer classification by regularized multi-task learning
title_fullStr Pan-cancer classification by regularized multi-task learning
title_full_unstemmed Pan-cancer classification by regularized multi-task learning
title_short Pan-cancer classification by regularized multi-task learning
title_sort pan-cancer classification by regularized multi-task learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8688544/
https://www.ncbi.nlm.nih.gov/pubmed/34930937
http://dx.doi.org/10.1038/s41598-021-03554-8
work_keys_str_mv AT hossainskmdmosaddek pancancerclassificationbyregularizedmultitasklearning
AT khatunlutfunnesa pancancerclassificationbyregularizedmultitasklearning
AT raysumanta pancancerclassificationbyregularizedmultitasklearning
AT mukhopadhyayanirban pancancerclassificationbyregularizedmultitasklearning