Cargando…

Uniform genomic data analysis in the NCI Genomic Data Commons

The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Zhenyu, Hernandez, Kyle, Savage, Jeremiah, Li, Shenglai, Miller, Dan, Agrawal, Stuti, Ortuno, Francisco, Staudt, Louis M., Heath, Allison, Grossman, Robert L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/ https://www.ncbi.nlm.nih.gov/pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9

_version_	1783654184051015680
author	Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L.
author_facet	Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L.
author_sort	Zhang, Zhenyu
collection	PubMed
description	The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/).
format	Online Article Text
id	pubmed-7900240
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-79002402021-03-05 Uniform genomic data analysis in the NCI Genomic Data Commons Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. Nat Commun Article The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/). Nature Publishing Group UK 2021-02-22 /pmc/articles/PMC7900240/ /pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. Uniform genomic data analysis in the NCI Genomic Data Commons
title	Uniform genomic data analysis in the NCI Genomic Data Commons
title_full	Uniform genomic data analysis in the NCI Genomic Data Commons
title_fullStr	Uniform genomic data analysis in the NCI Genomic Data Commons
title_full_unstemmed	Uniform genomic data analysis in the NCI Genomic Data Commons
title_short	Uniform genomic data analysis in the NCI Genomic Data Commons
title_sort	uniform genomic data analysis in the nci genomic data commons
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/ https://www.ncbi.nlm.nih.gov/pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9
work_keys_str_mv	AT zhangzhenyu uniformgenomicdataanalysisinthencigenomicdatacommons AT hernandezkyle uniformgenomicdataanalysisinthencigenomicdatacommons AT savagejeremiah uniformgenomicdataanalysisinthencigenomicdatacommons AT lishenglai uniformgenomicdataanalysisinthencigenomicdatacommons AT millerdan uniformgenomicdataanalysisinthencigenomicdatacommons AT agrawalstuti uniformgenomicdataanalysisinthencigenomicdatacommons AT ortunofrancisco uniformgenomicdataanalysisinthencigenomicdatacommons AT staudtlouism uniformgenomicdataanalysisinthencigenomicdatacommons AT heathallison uniformgenomicdataanalysisinthencigenomicdatacommons AT grossmanrobertl uniformgenomicdataanalysisinthencigenomicdatacommons

Uniform genomic data analysis in the NCI Genomic Data Commons

Ejemplares similares