Cargando…

Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets

Vast repositories of heterogeneous data from existing sources present unique opportunities. Taken individually, each of the datasets offers solutions to important domain and source-specific questions. Collectively, they represent complementary views of related data entities with an aggregate informa...

Descripción completa

Detalles Bibliográficos
Autores principales: Arslanturk, Suzan, Draghici, Sorin, Nguyen, Tin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933742/
https://www.ncbi.nlm.nih.gov/pubmed/31797627
_version_ 1783483271241269248
author Arslanturk, Suzan
Draghici, Sorin
Nguyen, Tin
author_facet Arslanturk, Suzan
Draghici, Sorin
Nguyen, Tin
author_sort Arslanturk, Suzan
collection PubMed
description Vast repositories of heterogeneous data from existing sources present unique opportunities. Taken individually, each of the datasets offers solutions to important domain and source-specific questions. Collectively, they represent complementary views of related data entities with an aggregate information value often well exceeding the sum of its parts. Integration of heterogeneous data is therefore paramount to i) obtain a more unified picture and comprehensive view of the relations, ii) achieve more robust results, iii) improve the accuracy and integrity, and iv) illuminate the complex interactions among data features. In this paper, we have proposed a data integration methodology to identify subtypes of cancer using multiple data types (mRNA, methylation, microRNA and somatic variants) and different data scales that come from different platforms (microarray, sequencing, etc.). The Cancer Genome Atlas (TCGA) dataset is used to build the data integration and cancer subtyping framework. The proposed data integration and disease subtyping approach accurately identifies novel subgroups of patients with significantly different survival profiles. With current availability of vast genomics, and variant data for cancer, the proposed data integration system will better differentiate cancer and patient subtypes for risk and outcome prediction and targeted treatment planning without additional cost and precious lost time.
format Online
Article
Text
id pubmed-6933742
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-69337422020-01-01 Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets Arslanturk, Suzan Draghici, Sorin Nguyen, Tin Pac Symp Biocomput Article Vast repositories of heterogeneous data from existing sources present unique opportunities. Taken individually, each of the datasets offers solutions to important domain and source-specific questions. Collectively, they represent complementary views of related data entities with an aggregate information value often well exceeding the sum of its parts. Integration of heterogeneous data is therefore paramount to i) obtain a more unified picture and comprehensive view of the relations, ii) achieve more robust results, iii) improve the accuracy and integrity, and iv) illuminate the complex interactions among data features. In this paper, we have proposed a data integration methodology to identify subtypes of cancer using multiple data types (mRNA, methylation, microRNA and somatic variants) and different data scales that come from different platforms (microarray, sequencing, etc.). The Cancer Genome Atlas (TCGA) dataset is used to build the data integration and cancer subtyping framework. The proposed data integration and disease subtyping approach accurately identifies novel subgroups of patients with significantly different survival profiles. With current availability of vast genomics, and variant data for cancer, the proposed data integration system will better differentiate cancer and patient subtypes for risk and outcome prediction and targeted treatment planning without additional cost and precious lost time. 2020 /pmc/articles/PMC6933742/ /pubmed/31797627 Text en http://creativecommons.org/licenses/by/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Arslanturk, Suzan
Draghici, Sorin
Nguyen, Tin
Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
title Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
title_full Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
title_fullStr Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
title_full_unstemmed Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
title_short Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
title_sort integrated cancer subtyping using heterogeneous genome-scale molecular datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933742/
https://www.ncbi.nlm.nih.gov/pubmed/31797627
work_keys_str_mv AT arslanturksuzan integratedcancersubtypingusingheterogeneousgenomescalemoleculardatasets
AT draghicisorin integratedcancersubtypingusingheterogeneousgenomescalemoleculardatasets
AT nguyentin integratedcancersubtypingusingheterogeneousgenomescalemoleculardatasets