Cargando…
A self-updating road map of The Cancer Genome Atlas
Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that ca...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654710/ https://www.ncbi.nlm.nih.gov/pubmed/23595662 http://dx.doi.org/10.1093/bioinformatics/btt141 |
_version_ | 1782476065157939200 |
---|---|
author | Robbins, David E. Grüneberg, Alexander Deus, Helena F. Tanik, Murat M. Almeida, Jonas S. |
author_facet | Robbins, David E. Grüneberg, Alexander Deus, Helena F. Tanik, Murat M. Almeida, Jonas S. |
author_sort | Robbins, David E. |
collection | PubMed |
description | Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. Contact: robbinsd@uab.edu |
format | Online Article Text |
id | pubmed-3654710 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-36547102013-05-17 A self-updating road map of The Cancer Genome Atlas Robbins, David E. Grüneberg, Alexander Deus, Helena F. Tanik, Murat M. Almeida, Jonas S. Bioinformatics Original Papers Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. Contact: robbinsd@uab.edu Oxford University Press 2013-05-15 2013-04-17 /pmc/articles/PMC3654710/ /pubmed/23595662 http://dx.doi.org/10.1093/bioinformatics/btt141 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Robbins, David E. Grüneberg, Alexander Deus, Helena F. Tanik, Murat M. Almeida, Jonas S. A self-updating road map of The Cancer Genome Atlas |
title | A self-updating road map of The Cancer Genome Atlas |
title_full | A self-updating road map of The Cancer Genome Atlas |
title_fullStr | A self-updating road map of The Cancer Genome Atlas |
title_full_unstemmed | A self-updating road map of The Cancer Genome Atlas |
title_short | A self-updating road map of The Cancer Genome Atlas |
title_sort | self-updating road map of the cancer genome atlas |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654710/ https://www.ncbi.nlm.nih.gov/pubmed/23595662 http://dx.doi.org/10.1093/bioinformatics/btt141 |
work_keys_str_mv | AT robbinsdavide aselfupdatingroadmapofthecancergenomeatlas AT grunebergalexander aselfupdatingroadmapofthecancergenomeatlas AT deushelenaf aselfupdatingroadmapofthecancergenomeatlas AT tanikmuratm aselfupdatingroadmapofthecancergenomeatlas AT almeidajonass aselfupdatingroadmapofthecancergenomeatlas AT robbinsdavide selfupdatingroadmapofthecancergenomeatlas AT grunebergalexander selfupdatingroadmapofthecancergenomeatlas AT deushelenaf selfupdatingroadmapofthecancergenomeatlas AT tanikmuratm selfupdatingroadmapofthecancergenomeatlas AT almeidajonass selfupdatingroadmapofthecancergenomeatlas |