Cargando…

Uniform genomic data analysis in the NCI Genomic Data Commons

The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhenyu, Hernandez, Kyle, Savage, Jeremiah, Li, Shenglai, Miller, Dan, Agrawal, Stuti, Ortuno, Francisco, Staudt, Louis M., Heath, Allison, Grossman, Robert L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/
https://www.ncbi.nlm.nih.gov/pubmed/33619257
http://dx.doi.org/10.1038/s41467-021-21254-9
_version_ 1783654184051015680
author Zhang, Zhenyu
Hernandez, Kyle
Savage, Jeremiah
Li, Shenglai
Miller, Dan
Agrawal, Stuti
Ortuno, Francisco
Staudt, Louis M.
Heath, Allison
Grossman, Robert L.
author_facet Zhang, Zhenyu
Hernandez, Kyle
Savage, Jeremiah
Li, Shenglai
Miller, Dan
Agrawal, Stuti
Ortuno, Francisco
Staudt, Louis M.
Heath, Allison
Grossman, Robert L.
author_sort Zhang, Zhenyu
collection PubMed
description The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/).
format Online
Article
Text
id pubmed-7900240
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-79002402021-03-05 Uniform genomic data analysis in the NCI Genomic Data Commons Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. Nat Commun Article The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/). Nature Publishing Group UK 2021-02-22 /pmc/articles/PMC7900240/ /pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Zhang, Zhenyu
Hernandez, Kyle
Savage, Jeremiah
Li, Shenglai
Miller, Dan
Agrawal, Stuti
Ortuno, Francisco
Staudt, Louis M.
Heath, Allison
Grossman, Robert L.
Uniform genomic data analysis in the NCI Genomic Data Commons
title Uniform genomic data analysis in the NCI Genomic Data Commons
title_full Uniform genomic data analysis in the NCI Genomic Data Commons
title_fullStr Uniform genomic data analysis in the NCI Genomic Data Commons
title_full_unstemmed Uniform genomic data analysis in the NCI Genomic Data Commons
title_short Uniform genomic data analysis in the NCI Genomic Data Commons
title_sort uniform genomic data analysis in the nci genomic data commons
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/
https://www.ncbi.nlm.nih.gov/pubmed/33619257
http://dx.doi.org/10.1038/s41467-021-21254-9
work_keys_str_mv AT zhangzhenyu uniformgenomicdataanalysisinthencigenomicdatacommons
AT hernandezkyle uniformgenomicdataanalysisinthencigenomicdatacommons
AT savagejeremiah uniformgenomicdataanalysisinthencigenomicdatacommons
AT lishenglai uniformgenomicdataanalysisinthencigenomicdatacommons
AT millerdan uniformgenomicdataanalysisinthencigenomicdatacommons
AT agrawalstuti uniformgenomicdataanalysisinthencigenomicdatacommons
AT ortunofrancisco uniformgenomicdataanalysisinthencigenomicdatacommons
AT staudtlouism uniformgenomicdataanalysisinthencigenomicdatacommons
AT heathallison uniformgenomicdataanalysisinthencigenomicdatacommons
AT grossmanrobertl uniformgenomicdataanalysisinthencigenomicdatacommons