Cargando…
TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiG...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216475/ https://www.ncbi.nlm.nih.gov/pubmed/35551364 http://dx.doi.org/10.1093/database/baac029 |
_version_ | 1784731430304088064 |
---|---|
author | Desai, Sanket Mishra, Rohit Ahmad, Suhail Hait, Supriya Joshi, Asim Dutt, Amit |
author_facet | Desai, Sanket Mishra, Rohit Ahmad, Suhail Hait, Supriya Joshi, Asim Dutt, Amit |
author_sort | Desai, Sanket |
collection | PubMed |
description | Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiGen) program, describing genome sequence variations across 598 and 1029 healthy individuals of Indian origin, respectively, along with the unique variants generated from our in-house 173 normal germline samples derived from cancer patients to generate the Tata Memorial Centre-SNP database (TMC-SNPdb) 2.0. To show its utility, GATK/Mutect2-based somatic variant calling was performed on 224 in-house tumor samples to demonstrate a reduction in false-positive somatic variants. In addition to the ethnic-specific variants from GenomeAsia 100K and IndiGenomes databases, 305 132 unique variants generated from 173 in-house normal germline samples derived from cancer patients of Indian origin constitute the Indian specific, TMC-SNPdb 2.0. Of 305 132 unique variants, 11.13% were found in the coding region with missense variants (31.3%) as the most predominant category. Among the non-coding variations, intronic variants (49%) were the highest contributors. The non-synonymous to synonymous SNP ratio was observed to be 1.9, consistent with the previous version of TMC-SNPdb and literature. Using TMC SNPdb 2.0, we analyzed a whole-exome sequence from 224 in-house tumor samples (180 paired and 44 orphans). We show an average depletion of 3.44% variants per paired tumor and significantly higher depletion (P-value < 0.001) for orphan tumors (4.21%), demonstrating the utility of the rare, unique variants found in the ethnic-specific variant datasets in reducing the false-positive somatic mutations. TMC-SNPdb 2.0 is the most exhaustive open-source reference database of germline variants occurring across 1800 Indian individuals to analyze cancer genomes and other genetic disorders. The database and toolkit package is available for download at the following: Database URL http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNPdb2/TMCSNPdb2.html |
format | Online Article Text |
id | pubmed-9216475 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92164752022-06-23 TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants Desai, Sanket Mishra, Rohit Ahmad, Suhail Hait, Supriya Joshi, Asim Dutt, Amit Database (Oxford) Database Update Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiGen) program, describing genome sequence variations across 598 and 1029 healthy individuals of Indian origin, respectively, along with the unique variants generated from our in-house 173 normal germline samples derived from cancer patients to generate the Tata Memorial Centre-SNP database (TMC-SNPdb) 2.0. To show its utility, GATK/Mutect2-based somatic variant calling was performed on 224 in-house tumor samples to demonstrate a reduction in false-positive somatic variants. In addition to the ethnic-specific variants from GenomeAsia 100K and IndiGenomes databases, 305 132 unique variants generated from 173 in-house normal germline samples derived from cancer patients of Indian origin constitute the Indian specific, TMC-SNPdb 2.0. Of 305 132 unique variants, 11.13% were found in the coding region with missense variants (31.3%) as the most predominant category. Among the non-coding variations, intronic variants (49%) were the highest contributors. The non-synonymous to synonymous SNP ratio was observed to be 1.9, consistent with the previous version of TMC-SNPdb and literature. Using TMC SNPdb 2.0, we analyzed a whole-exome sequence from 224 in-house tumor samples (180 paired and 44 orphans). We show an average depletion of 3.44% variants per paired tumor and significantly higher depletion (P-value < 0.001) for orphan tumors (4.21%), demonstrating the utility of the rare, unique variants found in the ethnic-specific variant datasets in reducing the false-positive somatic mutations. TMC-SNPdb 2.0 is the most exhaustive open-source reference database of germline variants occurring across 1800 Indian individuals to analyze cancer genomes and other genetic disorders. The database and toolkit package is available for download at the following: Database URL http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNPdb2/TMCSNPdb2.html Oxford University Press 2022-05-11 /pmc/articles/PMC9216475/ /pubmed/35551364 http://dx.doi.org/10.1093/database/baac029 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Database Update Desai, Sanket Mishra, Rohit Ahmad, Suhail Hait, Supriya Joshi, Asim Dutt, Amit TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants |
title | TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants |
title_full | TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants |
title_fullStr | TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants |
title_full_unstemmed | TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants |
title_short | TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants |
title_sort | tmc-snpdb 2.0: an ethnic-specific database of indian germline variants |
topic | Database Update |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216475/ https://www.ncbi.nlm.nih.gov/pubmed/35551364 http://dx.doi.org/10.1093/database/baac029 |
work_keys_str_mv | AT desaisanket tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants AT mishrarohit tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants AT ahmadsuhail tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants AT haitsupriya tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants AT joshiasim tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants AT duttamit tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants |