Cargando…

Origins and characterization of variants shared between databases of somatic and germline human mutations

BACKGROUND: Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many...

Descripción completa

Detalles Bibliográficos
Autores principales: Meyerson, William, Leisman, John, Navarro, Fabio C. P., Gerstein, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7273669/
https://www.ncbi.nlm.nih.gov/pubmed/32498674
http://dx.doi.org/10.1186/s12859-020-3508-8
_version_ 1783542450180063232
author Meyerson, William
Leisman, John
Navarro, Fabio C. P.
Gerstein, Mark
author_facet Meyerson, William
Leisman, John
Navarro, Fabio C. P.
Gerstein, Mark
author_sort Meyerson, William
collection PubMed
description BACKGROUND: Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap. RESULTS: After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones. CONCLUSIONS: Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.
format Online
Article
Text
id pubmed-7273669
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-72736692020-06-08 Origins and characterization of variants shared between databases of somatic and germline human mutations Meyerson, William Leisman, John Navarro, Fabio C. P. Gerstein, Mark BMC Bioinformatics Research Article BACKGROUND: Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap. RESULTS: After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones. CONCLUSIONS: Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation. BioMed Central 2020-06-04 /pmc/articles/PMC7273669/ /pubmed/32498674 http://dx.doi.org/10.1186/s12859-020-3508-8 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Meyerson, William
Leisman, John
Navarro, Fabio C. P.
Gerstein, Mark
Origins and characterization of variants shared between databases of somatic and germline human mutations
title Origins and characterization of variants shared between databases of somatic and germline human mutations
title_full Origins and characterization of variants shared between databases of somatic and germline human mutations
title_fullStr Origins and characterization of variants shared between databases of somatic and germline human mutations
title_full_unstemmed Origins and characterization of variants shared between databases of somatic and germline human mutations
title_short Origins and characterization of variants shared between databases of somatic and germline human mutations
title_sort origins and characterization of variants shared between databases of somatic and germline human mutations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7273669/
https://www.ncbi.nlm.nih.gov/pubmed/32498674
http://dx.doi.org/10.1186/s12859-020-3508-8
work_keys_str_mv AT meyersonwilliam originsandcharacterizationofvariantssharedbetweendatabasesofsomaticandgermlinehumanmutations
AT leismanjohn originsandcharacterizationofvariantssharedbetweendatabasesofsomaticandgermlinehumanmutations
AT navarrofabiocp originsandcharacterizationofvariantssharedbetweendatabasesofsomaticandgermlinehumanmutations
AT gersteinmark originsandcharacterizationofvariantssharedbetweendatabasesofsomaticandgermlinehumanmutations