Cargando…
Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets
Vast repositories of heterogeneous data from existing sources present unique opportunities. Taken individually, each of the datasets offers solutions to important domain and source-specific questions. Collectively, they represent complementary views of related data entities with an aggregate informa...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933742/ https://www.ncbi.nlm.nih.gov/pubmed/31797627 |
_version_ | 1783483271241269248 |
---|---|
author | Arslanturk, Suzan Draghici, Sorin Nguyen, Tin |
author_facet | Arslanturk, Suzan Draghici, Sorin Nguyen, Tin |
author_sort | Arslanturk, Suzan |
collection | PubMed |
description | Vast repositories of heterogeneous data from existing sources present unique opportunities. Taken individually, each of the datasets offers solutions to important domain and source-specific questions. Collectively, they represent complementary views of related data entities with an aggregate information value often well exceeding the sum of its parts. Integration of heterogeneous data is therefore paramount to i) obtain a more unified picture and comprehensive view of the relations, ii) achieve more robust results, iii) improve the accuracy and integrity, and iv) illuminate the complex interactions among data features. In this paper, we have proposed a data integration methodology to identify subtypes of cancer using multiple data types (mRNA, methylation, microRNA and somatic variants) and different data scales that come from different platforms (microarray, sequencing, etc.). The Cancer Genome Atlas (TCGA) dataset is used to build the data integration and cancer subtyping framework. The proposed data integration and disease subtyping approach accurately identifies novel subgroups of patients with significantly different survival profiles. With current availability of vast genomics, and variant data for cancer, the proposed data integration system will better differentiate cancer and patient subtypes for risk and outcome prediction and targeted treatment planning without additional cost and precious lost time. |
format | Online Article Text |
id | pubmed-6933742 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-69337422020-01-01 Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets Arslanturk, Suzan Draghici, Sorin Nguyen, Tin Pac Symp Biocomput Article Vast repositories of heterogeneous data from existing sources present unique opportunities. Taken individually, each of the datasets offers solutions to important domain and source-specific questions. Collectively, they represent complementary views of related data entities with an aggregate information value often well exceeding the sum of its parts. Integration of heterogeneous data is therefore paramount to i) obtain a more unified picture and comprehensive view of the relations, ii) achieve more robust results, iii) improve the accuracy and integrity, and iv) illuminate the complex interactions among data features. In this paper, we have proposed a data integration methodology to identify subtypes of cancer using multiple data types (mRNA, methylation, microRNA and somatic variants) and different data scales that come from different platforms (microarray, sequencing, etc.). The Cancer Genome Atlas (TCGA) dataset is used to build the data integration and cancer subtyping framework. The proposed data integration and disease subtyping approach accurately identifies novel subgroups of patients with significantly different survival profiles. With current availability of vast genomics, and variant data for cancer, the proposed data integration system will better differentiate cancer and patient subtypes for risk and outcome prediction and targeted treatment planning without additional cost and precious lost time. 2020 /pmc/articles/PMC6933742/ /pubmed/31797627 Text en http://creativecommons.org/licenses/by/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License. |
spellingShingle | Article Arslanturk, Suzan Draghici, Sorin Nguyen, Tin Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets |
title | Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets |
title_full | Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets |
title_fullStr | Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets |
title_full_unstemmed | Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets |
title_short | Integrated Cancer Subtyping using Heterogeneous Genome-Scale Molecular Datasets |
title_sort | integrated cancer subtyping using heterogeneous genome-scale molecular datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933742/ https://www.ncbi.nlm.nih.gov/pubmed/31797627 |
work_keys_str_mv | AT arslanturksuzan integratedcancersubtypingusingheterogeneousgenomescalemoleculardatasets AT draghicisorin integratedcancersubtypingusingheterogeneousgenomescalemoleculardatasets AT nguyentin integratedcancersubtypingusingheterogeneousgenomescalemoleculardatasets |