Cargando…
Uniform genomic data analysis in the NCI Genomic Data Commons
The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/ https://www.ncbi.nlm.nih.gov/pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9 |
_version_ | 1783654184051015680 |
---|---|
author | Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. |
author_facet | Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. |
author_sort | Zhang, Zhenyu |
collection | PubMed |
description | The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/). |
format | Online Article Text |
id | pubmed-7900240 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-79002402021-03-05 Uniform genomic data analysis in the NCI Genomic Data Commons Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. Nat Commun Article The goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/). Nature Publishing Group UK 2021-02-22 /pmc/articles/PMC7900240/ /pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Zhang, Zhenyu Hernandez, Kyle Savage, Jeremiah Li, Shenglai Miller, Dan Agrawal, Stuti Ortuno, Francisco Staudt, Louis M. Heath, Allison Grossman, Robert L. Uniform genomic data analysis in the NCI Genomic Data Commons |
title | Uniform genomic data analysis in the NCI Genomic Data Commons |
title_full | Uniform genomic data analysis in the NCI Genomic Data Commons |
title_fullStr | Uniform genomic data analysis in the NCI Genomic Data Commons |
title_full_unstemmed | Uniform genomic data analysis in the NCI Genomic Data Commons |
title_short | Uniform genomic data analysis in the NCI Genomic Data Commons |
title_sort | uniform genomic data analysis in the nci genomic data commons |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/ https://www.ncbi.nlm.nih.gov/pubmed/33619257 http://dx.doi.org/10.1038/s41467-021-21254-9 |
work_keys_str_mv | AT zhangzhenyu uniformgenomicdataanalysisinthencigenomicdatacommons AT hernandezkyle uniformgenomicdataanalysisinthencigenomicdatacommons AT savagejeremiah uniformgenomicdataanalysisinthencigenomicdatacommons AT lishenglai uniformgenomicdataanalysisinthencigenomicdatacommons AT millerdan uniformgenomicdataanalysisinthencigenomicdatacommons AT agrawalstuti uniformgenomicdataanalysisinthencigenomicdatacommons AT ortunofrancisco uniformgenomicdataanalysisinthencigenomicdatacommons AT staudtlouism uniformgenomicdataanalysisinthencigenomicdatacommons AT heathallison uniformgenomicdataanalysisinthencigenomicdatacommons AT grossmanrobertl uniformgenomicdataanalysisinthencigenomicdatacommons |