Cargando…

Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers

Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer res...

Descripción completa

Detalles Bibliográficos
Autores principales: Torcivia, John, Abdilleh, Kawther, Seidl, Fabian, Shahzada, Owais, Rodriguez, Rebecca, Pot, David, Mazumder, Raja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571071/
https://www.ncbi.nlm.nih.gov/pubmed/37841494
http://dx.doi.org/10.3390/onco2020009
_version_ 1785119903391416320
author Torcivia, John
Abdilleh, Kawther
Seidl, Fabian
Shahzada, Owais
Rodriguez, Rebecca
Pot, David
Mazumder, Raja
author_facet Torcivia, John
Abdilleh, Kawther
Seidl, Fabian
Shahzada, Owais
Rodriguez, Rebecca
Pot, David
Mazumder, Raja
author_sort Torcivia, John
collection PubMed
description Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data.
format Online
Article
Text
id pubmed-10571071
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-105710712023-10-13 Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers Torcivia, John Abdilleh, Kawther Seidl, Fabian Shahzada, Owais Rodriguez, Rebecca Pot, David Mazumder, Raja Onco (Basel) Article Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data. 2022-06 2022-06-17 /pmc/articles/PMC10571071/ /pubmed/37841494 http://dx.doi.org/10.3390/onco2020009 Text en https://creativecommons.org/licenses/by/4.0/This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Torcivia, John
Abdilleh, Kawther
Seidl, Fabian
Shahzada, Owais
Rodriguez, Rebecca
Pot, David
Mazumder, Raja
Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
title Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
title_full Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
title_fullStr Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
title_full_unstemmed Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
title_short Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
title_sort whole genome variant dataset for enriching studies across 18 different cancers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571071/
https://www.ncbi.nlm.nih.gov/pubmed/37841494
http://dx.doi.org/10.3390/onco2020009
work_keys_str_mv AT torciviajohn wholegenomevariantdatasetforenrichingstudiesacross18differentcancers
AT abdillehkawther wholegenomevariantdatasetforenrichingstudiesacross18differentcancers
AT seidlfabian wholegenomevariantdatasetforenrichingstudiesacross18differentcancers
AT shahzadaowais wholegenomevariantdatasetforenrichingstudiesacross18differentcancers
AT rodriguezrebecca wholegenomevariantdatasetforenrichingstudiesacross18differentcancers
AT potdavid wholegenomevariantdatasetforenrichingstudiesacross18differentcancers
AT mazumderraja wholegenomevariantdatasetforenrichingstudiesacross18differentcancers