Cargando…
Multiomic Integration of Public Oncology Databases in Bioconductor
PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic da...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society of Clinical Oncology
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608653/ https://www.ncbi.nlm.nih.gov/pubmed/33119407 http://dx.doi.org/10.1200/CCI.19.00119 |
_version_ | 1783604879861743616 |
---|---|
author | Ramos, Marcel Geistlinger, Ludwig Oh, Sehyun Schiffer, Lucas Azhar, Rimsha Kodali, Hanish de Bruijn, Ino Gao, Jianjiong Carey, Vincent J. Morgan, Martin Waldron, Levi |
author_facet | Ramos, Marcel Geistlinger, Ludwig Oh, Sehyun Schiffer, Lucas Azhar, Rimsha Kodali, Hanish de Bruijn, Ino Gao, Jianjiong Carey, Vincent J. Morgan, Martin Waldron, Levi |
author_sort | Ramos, Marcel |
collection | PubMed |
description | PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS: We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS: We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION: These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples. |
format | Online Article Text |
id | pubmed-7608653 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Society of Clinical Oncology |
record_format | MEDLINE/PubMed |
spelling | pubmed-76086532021-10-29 Multiomic Integration of Public Oncology Databases in Bioconductor Ramos, Marcel Geistlinger, Ludwig Oh, Sehyun Schiffer, Lucas Azhar, Rimsha Kodali, Hanish de Bruijn, Ino Gao, Jianjiong Carey, Vincent J. Morgan, Martin Waldron, Levi JCO Clin Cancer Inform ORIGINAL REPORTS PURPOSE: Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS: We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS: We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION: These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples. American Society of Clinical Oncology 2020-10-29 /pmc/articles/PMC7608653/ /pubmed/33119407 http://dx.doi.org/10.1200/CCI.19.00119 Text en © 2020 by American Society of Clinical Oncology https://creativecommons.org/licenses/by/4.0/ Licensed under the Creative Commons Attribution 4.0 License: https://creativecommons.org/licenses/by/4.0/ |
spellingShingle | ORIGINAL REPORTS Ramos, Marcel Geistlinger, Ludwig Oh, Sehyun Schiffer, Lucas Azhar, Rimsha Kodali, Hanish de Bruijn, Ino Gao, Jianjiong Carey, Vincent J. Morgan, Martin Waldron, Levi Multiomic Integration of Public Oncology Databases in Bioconductor |
title | Multiomic Integration of Public Oncology Databases in Bioconductor |
title_full | Multiomic Integration of Public Oncology Databases in Bioconductor |
title_fullStr | Multiomic Integration of Public Oncology Databases in Bioconductor |
title_full_unstemmed | Multiomic Integration of Public Oncology Databases in Bioconductor |
title_short | Multiomic Integration of Public Oncology Databases in Bioconductor |
title_sort | multiomic integration of public oncology databases in bioconductor |
topic | ORIGINAL REPORTS |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608653/ https://www.ncbi.nlm.nih.gov/pubmed/33119407 http://dx.doi.org/10.1200/CCI.19.00119 |
work_keys_str_mv | AT ramosmarcel multiomicintegrationofpubliconcologydatabasesinbioconductor AT geistlingerludwig multiomicintegrationofpubliconcologydatabasesinbioconductor AT ohsehyun multiomicintegrationofpubliconcologydatabasesinbioconductor AT schifferlucas multiomicintegrationofpubliconcologydatabasesinbioconductor AT azharrimsha multiomicintegrationofpubliconcologydatabasesinbioconductor AT kodalihanish multiomicintegrationofpubliconcologydatabasesinbioconductor AT debruijnino multiomicintegrationofpubliconcologydatabasesinbioconductor AT gaojianjiong multiomicintegrationofpubliconcologydatabasesinbioconductor AT careyvincentj multiomicintegrationofpubliconcologydatabasesinbioconductor AT morganmartin multiomicintegrationofpubliconcologydatabasesinbioconductor AT waldronlevi multiomicintegrationofpubliconcologydatabasesinbioconductor |