Cargando…
Pan-cancer classification by regularized multi-task learning
Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8688544/ https://www.ncbi.nlm.nih.gov/pubmed/34930937 http://dx.doi.org/10.1038/s41598-021-03554-8 |
_version_ | 1784618375860715520 |
---|---|
author | Hossain, Sk Md Mosaddek Khatun, Lutfunnesa Ray, Sumanta Mukhopadhyay, Anirban |
author_facet | Hossain, Sk Md Mosaddek Khatun, Lutfunnesa Ray, Sumanta Mukhopadhyay, Anirban |
author_sort | Hossain, Sk Md Mosaddek |
collection | PubMed |
description | Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples. |
format | Online Article Text |
id | pubmed-8688544 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-86885442021-12-22 Pan-cancer classification by regularized multi-task learning Hossain, Sk Md Mosaddek Khatun, Lutfunnesa Ray, Sumanta Mukhopadhyay, Anirban Sci Rep Article Classifying pan-cancer samples using gene expression patterns is a crucial challenge for the accurate diagnosis and treatment of cancer patients. Machine learning algorithms have been considered proven tools to perform downstream analysis and capture the deviations in gene expression patterns across diversified diseases. In our present work, we have developed PC-RMTL, a pan-cancer classification model using regularized multi-task learning (RMTL) for classifying 21 cancer types and adjacent normal samples using RNASeq data obtained from TCGA. PC-RMTL is observed to outperform when compared with five state-of-the-art classification algorithms, viz. SVM with the linear kernel (SVM-Lin), SVM with radial basis function kernel (SVM-RBF), random forest (RF), k-nearest neighbours (kNN), and decision trees (DT). The PC-RMTL achieves 96.07% accuracy and 95.80% MCC score for a completely unknown independent test set. The only method that appears as the real competitor is SVM-Lin, which nearly equalizes the accuracy in prediction of PC-RMTL but only when complete feature sets are provided for training; otherwise, PC-RMTL outperformed all other classification models. To the best of our knowledge, this is a significant improvement over all the existing works in pan-cancer classification as they have failed to classify many cancer types from one another reliably. We have also compared gene expression patterns of the top discriminating genes across the cancers and performed their functional enrichment analysis that uncovers several interesting facts in distinguishing pan-cancer samples. Nature Publishing Group UK 2021-12-20 /pmc/articles/PMC8688544/ /pubmed/34930937 http://dx.doi.org/10.1038/s41598-021-03554-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Hossain, Sk Md Mosaddek Khatun, Lutfunnesa Ray, Sumanta Mukhopadhyay, Anirban Pan-cancer classification by regularized multi-task learning |
title | Pan-cancer classification by regularized multi-task learning |
title_full | Pan-cancer classification by regularized multi-task learning |
title_fullStr | Pan-cancer classification by regularized multi-task learning |
title_full_unstemmed | Pan-cancer classification by regularized multi-task learning |
title_short | Pan-cancer classification by regularized multi-task learning |
title_sort | pan-cancer classification by regularized multi-task learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8688544/ https://www.ncbi.nlm.nih.gov/pubmed/34930937 http://dx.doi.org/10.1038/s41598-021-03554-8 |
work_keys_str_mv | AT hossainskmdmosaddek pancancerclassificationbyregularizedmultitasklearning AT khatunlutfunnesa pancancerclassificationbyregularizedmultitasklearning AT raysumanta pancancerclassificationbyregularizedmultitasklearning AT mukhopadhyayanirban pancancerclassificationbyregularizedmultitasklearning |