Cargando…

TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants

Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiG...

Descripción completa

Detalles Bibliográficos
Autores principales: Desai, Sanket, Mishra, Rohit, Ahmad, Suhail, Hait, Supriya, Joshi, Asim, Dutt, Amit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216475/
https://www.ncbi.nlm.nih.gov/pubmed/35551364
http://dx.doi.org/10.1093/database/baac029
_version_ 1784731430304088064
author Desai, Sanket
Mishra, Rohit
Ahmad, Suhail
Hait, Supriya
Joshi, Asim
Dutt, Amit
author_facet Desai, Sanket
Mishra, Rohit
Ahmad, Suhail
Hait, Supriya
Joshi, Asim
Dutt, Amit
author_sort Desai, Sanket
collection PubMed
description Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiGen) program, describing genome sequence variations across 598 and 1029 healthy individuals of Indian origin, respectively, along with the unique variants generated from our in-house 173 normal germline samples derived from cancer patients to generate the Tata Memorial Centre-SNP database (TMC-SNPdb) 2.0. To show its utility, GATK/Mutect2-based somatic variant calling was performed on 224 in-house tumor samples to demonstrate a reduction in false-positive somatic variants. In addition to the ethnic-specific variants from GenomeAsia 100K and IndiGenomes databases, 305 132 unique variants generated from 173 in-house normal germline samples derived from cancer patients of Indian origin constitute the Indian specific, TMC-SNPdb 2.0. Of 305 132 unique variants, 11.13% were found in the coding region with missense variants (31.3%) as the most predominant category. Among the non-coding variations, intronic variants (49%) were the highest contributors. The non-synonymous to synonymous SNP ratio was observed to be 1.9, consistent with the previous version of TMC-SNPdb and literature. Using TMC SNPdb 2.0, we analyzed a whole-exome sequence from 224 in-house tumor samples (180 paired and 44 orphans). We show an average depletion of 3.44% variants per paired tumor and significantly higher depletion (P-value < 0.001) for orphan tumors (4.21%), demonstrating the utility of the rare, unique variants found in the ethnic-specific variant datasets in reducing the false-positive somatic mutations. TMC-SNPdb 2.0 is the most exhaustive open-source reference database of germline variants occurring across 1800 Indian individuals to analyze cancer genomes and other genetic disorders. The database and toolkit package is available for download at the following: Database URL  http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNPdb2/TMCSNPdb2.html
format Online
Article
Text
id pubmed-9216475
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92164752022-06-23 TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants Desai, Sanket Mishra, Rohit Ahmad, Suhail Hait, Supriya Joshi, Asim Dutt, Amit Database (Oxford) Database Update Cancer is a somatic disease. The lack of Indian-specific reference germline variation resources limits the ability to identify true cancer-associated somatic variants among Indian cancer patients. We integrate two recent studies, the GenomeAsia 100K and the Genomics for Public Health in India (IndiGen) program, describing genome sequence variations across 598 and 1029 healthy individuals of Indian origin, respectively, along with the unique variants generated from our in-house 173 normal germline samples derived from cancer patients to generate the Tata Memorial Centre-SNP database (TMC-SNPdb) 2.0. To show its utility, GATK/Mutect2-based somatic variant calling was performed on 224 in-house tumor samples to demonstrate a reduction in false-positive somatic variants. In addition to the ethnic-specific variants from GenomeAsia 100K and IndiGenomes databases, 305 132 unique variants generated from 173 in-house normal germline samples derived from cancer patients of Indian origin constitute the Indian specific, TMC-SNPdb 2.0. Of 305 132 unique variants, 11.13% were found in the coding region with missense variants (31.3%) as the most predominant category. Among the non-coding variations, intronic variants (49%) were the highest contributors. The non-synonymous to synonymous SNP ratio was observed to be 1.9, consistent with the previous version of TMC-SNPdb and literature. Using TMC SNPdb 2.0, we analyzed a whole-exome sequence from 224 in-house tumor samples (180 paired and 44 orphans). We show an average depletion of 3.44% variants per paired tumor and significantly higher depletion (P-value < 0.001) for orphan tumors (4.21%), demonstrating the utility of the rare, unique variants found in the ethnic-specific variant datasets in reducing the false-positive somatic mutations. TMC-SNPdb 2.0 is the most exhaustive open-source reference database of germline variants occurring across 1800 Indian individuals to analyze cancer genomes and other genetic disorders. The database and toolkit package is available for download at the following: Database URL  http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNPdb2/TMCSNPdb2.html Oxford University Press 2022-05-11 /pmc/articles/PMC9216475/ /pubmed/35551364 http://dx.doi.org/10.1093/database/baac029 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Database Update
Desai, Sanket
Mishra, Rohit
Ahmad, Suhail
Hait, Supriya
Joshi, Asim
Dutt, Amit
TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
title TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
title_full TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
title_fullStr TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
title_full_unstemmed TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
title_short TMC-SNPdb 2.0: an ethnic-specific database of Indian germline variants
title_sort tmc-snpdb 2.0: an ethnic-specific database of indian germline variants
topic Database Update
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216475/
https://www.ncbi.nlm.nih.gov/pubmed/35551364
http://dx.doi.org/10.1093/database/baac029
work_keys_str_mv AT desaisanket tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants
AT mishrarohit tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants
AT ahmadsuhail tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants
AT haitsupriya tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants
AT joshiasim tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants
AT duttamit tmcsnpdb20anethnicspecificdatabaseofindiangermlinevariants