Cargando…

SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis

Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual pati...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Hung, Tran, Duc, Tran, Bang, Roy, Monikrishna, Cassell, Adam, Dascalu, Sergiu, Draghici, Sorin, Nguyen, Tin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8563705/
https://www.ncbi.nlm.nih.gov/pubmed/34745946
http://dx.doi.org/10.3389/fonc.2021.725133
_version_ 1784593462131163136
author Nguyen, Hung
Tran, Duc
Tran, Bang
Roy, Monikrishna
Cassell, Adam
Dascalu, Sergiu
Draghici, Sorin
Nguyen, Tin
author_facet Nguyen, Hung
Tran, Duc
Tran, Bang
Roy, Monikrishna
Cassell, Adam
Dascalu, Sergiu
Draghici, Sorin
Nguyen, Tin
author_sort Nguyen, Hung
collection PubMed
description Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.
format Online
Article
Text
id pubmed-8563705
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85637052021-11-04 SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis Nguyen, Hung Tran, Duc Tran, Bang Roy, Monikrishna Cassell, Adam Dascalu, Sergiu Draghici, Sorin Nguyen, Tin Front Oncol Oncology Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite. Frontiers Media S.A. 2021-10-20 /pmc/articles/PMC8563705/ /pubmed/34745946 http://dx.doi.org/10.3389/fonc.2021.725133 Text en Copyright © 2021 Nguyen, Tran, Tran, Roy, Cassell, Dascalu, Draghici and Nguyen https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Nguyen, Hung
Tran, Duc
Tran, Bang
Roy, Monikrishna
Cassell, Adam
Dascalu, Sergiu
Draghici, Sorin
Nguyen, Tin
SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis
title SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis
title_full SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis
title_fullStr SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis
title_full_unstemmed SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis
title_short SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis
title_sort smrt: randomized data transformation for cancer subtyping and big data analysis
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8563705/
https://www.ncbi.nlm.nih.gov/pubmed/34745946
http://dx.doi.org/10.3389/fonc.2021.725133
work_keys_str_mv AT nguyenhung smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT tranduc smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT tranbang smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT roymonikrishna smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT casselladam smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT dascalusergiu smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT draghicisorin smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis
AT nguyentin smrtrandomizeddatatransformationforcancersubtypingandbigdataanalysis