Cargando…
Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers
Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer res...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571071/ https://www.ncbi.nlm.nih.gov/pubmed/37841494 http://dx.doi.org/10.3390/onco2020009 |
_version_ | 1785119903391416320 |
---|---|
author | Torcivia, John Abdilleh, Kawther Seidl, Fabian Shahzada, Owais Rodriguez, Rebecca Pot, David Mazumder, Raja |
author_facet | Torcivia, John Abdilleh, Kawther Seidl, Fabian Shahzada, Owais Rodriguez, Rebecca Pot, David Mazumder, Raja |
author_sort | Torcivia, John |
collection | PubMed |
description | Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data. |
format | Online Article Text |
id | pubmed-10571071 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
record_format | MEDLINE/PubMed |
spelling | pubmed-105710712023-10-13 Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers Torcivia, John Abdilleh, Kawther Seidl, Fabian Shahzada, Owais Rodriguez, Rebecca Pot, David Mazumder, Raja Onco (Basel) Article Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data. 2022-06 2022-06-17 /pmc/articles/PMC10571071/ /pubmed/37841494 http://dx.doi.org/10.3390/onco2020009 Text en https://creativecommons.org/licenses/by/4.0/This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Torcivia, John Abdilleh, Kawther Seidl, Fabian Shahzada, Owais Rodriguez, Rebecca Pot, David Mazumder, Raja Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers |
title | Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers |
title_full | Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers |
title_fullStr | Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers |
title_full_unstemmed | Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers |
title_short | Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers |
title_sort | whole genome variant dataset for enriching studies across 18 different cancers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571071/ https://www.ncbi.nlm.nih.gov/pubmed/37841494 http://dx.doi.org/10.3390/onco2020009 |
work_keys_str_mv | AT torciviajohn wholegenomevariantdatasetforenrichingstudiesacross18differentcancers AT abdillehkawther wholegenomevariantdatasetforenrichingstudiesacross18differentcancers AT seidlfabian wholegenomevariantdatasetforenrichingstudiesacross18differentcancers AT shahzadaowais wholegenomevariantdatasetforenrichingstudiesacross18differentcancers AT rodriguezrebecca wholegenomevariantdatasetforenrichingstudiesacross18differentcancers AT potdavid wholegenomevariantdatasetforenrichingstudiesacross18differentcancers AT mazumderraja wholegenomevariantdatasetforenrichingstudiesacross18differentcancers |