Cargando…

TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas

BACKGROUND: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data conta...

Descripción completa

Detalles Bibliográficos
Autores principales: Cumbo, Fabio, Fiscon, Giulia, Ceri, Stefano, Masseroli, Marco, Weitschek, Emanuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210259/
https://www.ncbi.nlm.nih.gov/pubmed/28049410
http://dx.doi.org/10.1186/s12859-016-1419-5
_version_ 1782490847460196352
author Cumbo, Fabio
Fiscon, Giulia
Ceri, Stefano
Masseroli, Marco
Weitschek, Emanuel
author_facet Cumbo, Fabio
Fiscon, Giulia
Ceri, Stefano
Masseroli, Marco
Weitschek, Emanuel
author_sort Cumbo, Fabio
collection PubMed
description BACKGROUND: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. RESULTS: We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. CONCLUSIONS: The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1419-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5210259
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52102592017-01-06 TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas Cumbo, Fabio Fiscon, Giulia Ceri, Stefano Masseroli, Marco Weitschek, Emanuel BMC Bioinformatics Software BACKGROUND: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. RESULTS: We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. CONCLUSIONS: The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1419-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-03 /pmc/articles/PMC5210259/ /pubmed/28049410 http://dx.doi.org/10.1186/s12859-016-1419-5 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Cumbo, Fabio
Fiscon, Giulia
Ceri, Stefano
Masseroli, Marco
Weitschek, Emanuel
TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
title TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
title_full TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
title_fullStr TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
title_full_unstemmed TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
title_short TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas
title_sort tcga2bed: extracting, extending, integrating, and querying the cancer genome atlas
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210259/
https://www.ncbi.nlm.nih.gov/pubmed/28049410
http://dx.doi.org/10.1186/s12859-016-1419-5
work_keys_str_mv AT cumbofabio tcga2bedextractingextendingintegratingandqueryingthecancergenomeatlas
AT fiscongiulia tcga2bedextractingextendingintegratingandqueryingthecancergenomeatlas
AT ceristefano tcga2bedextractingextendingintegratingandqueryingthecancergenomeatlas
AT masserolimarco tcga2bedextractingextendingintegratingandqueryingthecancergenomeatlas
AT weitschekemanuel tcga2bedextractingextendingintegratingandqueryingthecancergenomeatlas