Cargando…

A self-updating road map of The Cancer Genome Atlas

Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Robbins, David E., Grüneberg, Alexander, Deus, Helena F., Tanik, Murat M., Almeida, Jonas S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654710/
https://www.ncbi.nlm.nih.gov/pubmed/23595662
http://dx.doi.org/10.1093/bioinformatics/btt141
_version_ 1782476065157939200
author Robbins, David E.
Grüneberg, Alexander
Deus, Helena F.
Tanik, Murat M.
Almeida, Jonas S.
author_facet Robbins, David E.
Grüneberg, Alexander
Deus, Helena F.
Tanik, Murat M.
Almeida, Jonas S.
author_sort Robbins, David E.
collection PubMed
description Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. Contact: robbinsd@uab.edu
format Online
Article
Text
id pubmed-3654710
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36547102013-05-17 A self-updating road map of The Cancer Genome Atlas Robbins, David E. Grüneberg, Alexander Deus, Helena F. Tanik, Murat M. Almeida, Jonas S. Bioinformatics Original Papers Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. Contact: robbinsd@uab.edu Oxford University Press 2013-05-15 2013-04-17 /pmc/articles/PMC3654710/ /pubmed/23595662 http://dx.doi.org/10.1093/bioinformatics/btt141 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Robbins, David E.
Grüneberg, Alexander
Deus, Helena F.
Tanik, Murat M.
Almeida, Jonas S.
A self-updating road map of The Cancer Genome Atlas
title A self-updating road map of The Cancer Genome Atlas
title_full A self-updating road map of The Cancer Genome Atlas
title_fullStr A self-updating road map of The Cancer Genome Atlas
title_full_unstemmed A self-updating road map of The Cancer Genome Atlas
title_short A self-updating road map of The Cancer Genome Atlas
title_sort self-updating road map of the cancer genome atlas
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654710/
https://www.ncbi.nlm.nih.gov/pubmed/23595662
http://dx.doi.org/10.1093/bioinformatics/btt141
work_keys_str_mv AT robbinsdavide aselfupdatingroadmapofthecancergenomeatlas
AT grunebergalexander aselfupdatingroadmapofthecancergenomeatlas
AT deushelenaf aselfupdatingroadmapofthecancergenomeatlas
AT tanikmuratm aselfupdatingroadmapofthecancergenomeatlas
AT almeidajonass aselfupdatingroadmapofthecancergenomeatlas
AT robbinsdavide selfupdatingroadmapofthecancergenomeatlas
AT grunebergalexander selfupdatingroadmapofthecancergenomeatlas
AT deushelenaf selfupdatingroadmapofthecancergenomeatlas
AT tanikmuratm selfupdatingroadmapofthecancergenomeatlas
AT almeidajonass selfupdatingroadmapofthecancergenomeatlas