Cargando…

Multiomic Integration of Public Oncology Databases in Bioconductor

PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic da...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramos, Marcel, Geistlinger, Ludwig, Oh, Sehyun, Schiffer, Lucas, Azhar, Rimsha, Kodali, Hanish, de Bruijn, Ino, Gao, Jianjiong, Carey, Vincent J., Morgan, Martin, Waldron, Levi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Clinical Oncology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608653/
https://www.ncbi.nlm.nih.gov/pubmed/33119407
http://dx.doi.org/10.1200/CCI.19.00119
_version_ 1783604879861743616
author Ramos, Marcel
Geistlinger, Ludwig
Oh, Sehyun
Schiffer, Lucas
Azhar, Rimsha
Kodali, Hanish
de Bruijn, Ino
Gao, Jianjiong
Carey, Vincent J.
Morgan, Martin
Waldron, Levi
author_facet Ramos, Marcel
Geistlinger, Ludwig
Oh, Sehyun
Schiffer, Lucas
Azhar, Rimsha
Kodali, Hanish
de Bruijn, Ino
Gao, Jianjiong
Carey, Vincent J.
Morgan, Martin
Waldron, Levi
author_sort Ramos, Marcel
collection PubMed
description PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS: We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS: We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION: These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.
format Online
Article
Text
id pubmed-7608653
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society of Clinical Oncology
record_format MEDLINE/PubMed
spelling pubmed-76086532021-10-29 Multiomic Integration of Public Oncology Databases in Bioconductor Ramos, Marcel Geistlinger, Ludwig Oh, Sehyun Schiffer, Lucas Azhar, Rimsha Kodali, Hanish de Bruijn, Ino Gao, Jianjiong Carey, Vincent J. Morgan, Martin Waldron, Levi JCO Clin Cancer Inform ORIGINAL REPORTS PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS: We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS: We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION: These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples. American Society of Clinical Oncology 2020-10-29 /pmc/articles/PMC7608653/ /pubmed/33119407 http://dx.doi.org/10.1200/CCI.19.00119 Text en © 2020 by American Society of Clinical Oncology https://creativecommons.org/licenses/by/4.0/ Licensed under the Creative Commons Attribution 4.0 License: https://creativecommons.org/licenses/by/4.0/
spellingShingle ORIGINAL REPORTS
Ramos, Marcel
Geistlinger, Ludwig
Oh, Sehyun
Schiffer, Lucas
Azhar, Rimsha
Kodali, Hanish
de Bruijn, Ino
Gao, Jianjiong
Carey, Vincent J.
Morgan, Martin
Waldron, Levi
Multiomic Integration of Public Oncology Databases in Bioconductor
title Multiomic Integration of Public Oncology Databases in Bioconductor
title_full Multiomic Integration of Public Oncology Databases in Bioconductor
title_fullStr Multiomic Integration of Public Oncology Databases in Bioconductor
title_full_unstemmed Multiomic Integration of Public Oncology Databases in Bioconductor
title_short Multiomic Integration of Public Oncology Databases in Bioconductor
title_sort multiomic integration of public oncology databases in bioconductor
topic ORIGINAL REPORTS
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608653/
https://www.ncbi.nlm.nih.gov/pubmed/33119407
http://dx.doi.org/10.1200/CCI.19.00119
work_keys_str_mv AT ramosmarcel multiomicintegrationofpubliconcologydatabasesinbioconductor
AT geistlingerludwig multiomicintegrationofpubliconcologydatabasesinbioconductor
AT ohsehyun multiomicintegrationofpubliconcologydatabasesinbioconductor
AT schifferlucas multiomicintegrationofpubliconcologydatabasesinbioconductor
AT azharrimsha multiomicintegrationofpubliconcologydatabasesinbioconductor
AT kodalihanish multiomicintegrationofpubliconcologydatabasesinbioconductor
AT debruijnino multiomicintegrationofpubliconcologydatabasesinbioconductor
AT gaojianjiong multiomicintegrationofpubliconcologydatabasesinbioconductor
AT careyvincentj multiomicintegrationofpubliconcologydatabasesinbioconductor
AT morganmartin multiomicintegrationofpubliconcologydatabasesinbioconductor
AT waldronlevi multiomicintegrationofpubliconcologydatabasesinbioconductor