Cargando…
Germline contamination and leakage in whole genome somatic single nucleotide variant detection
BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patient...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793408/ https://www.ncbi.nlm.nih.gov/pubmed/29385983 http://dx.doi.org/10.1186/s12859-018-2046-0 |
_version_ | 1783296945772560384 |
---|---|
author | Sendorek, Dorota H. Caloian, Cristian Ellrott, Kyle Bare, J. Christopher Yamaguchi, Takafumi N. Ewing, Adam D. Houlahan, Kathleen E. Norman, Thea C. Margolin, Adam A. Stuart, Joshua M. Boutros, Paul C. |
author_facet | Sendorek, Dorota H. Caloian, Cristian Ellrott, Kyle Bare, J. Christopher Yamaguchi, Takafumi N. Ewing, Adam D. Houlahan, Kathleen E. Norman, Thea C. Margolin, Adam A. Stuart, Joshua M. Boutros, Paul C. |
author_sort | Sendorek, Dorota H. |
collection | PubMed |
description | BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called “germline leakage”. The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS: The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS: The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2046-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5793408 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57934082018-02-12 Germline contamination and leakage in whole genome somatic single nucleotide variant detection Sendorek, Dorota H. Caloian, Cristian Ellrott, Kyle Bare, J. Christopher Yamaguchi, Takafumi N. Ewing, Adam D. Houlahan, Kathleen E. Norman, Thea C. Margolin, Adam A. Stuart, Joshua M. Boutros, Paul C. BMC Bioinformatics Research Article BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called “germline leakage”. The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS: The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS: The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2046-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-31 /pmc/articles/PMC5793408/ /pubmed/29385983 http://dx.doi.org/10.1186/s12859-018-2046-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Sendorek, Dorota H. Caloian, Cristian Ellrott, Kyle Bare, J. Christopher Yamaguchi, Takafumi N. Ewing, Adam D. Houlahan, Kathleen E. Norman, Thea C. Margolin, Adam A. Stuart, Joshua M. Boutros, Paul C. Germline contamination and leakage in whole genome somatic single nucleotide variant detection |
title | Germline contamination and leakage in whole genome somatic single nucleotide variant detection |
title_full | Germline contamination and leakage in whole genome somatic single nucleotide variant detection |
title_fullStr | Germline contamination and leakage in whole genome somatic single nucleotide variant detection |
title_full_unstemmed | Germline contamination and leakage in whole genome somatic single nucleotide variant detection |
title_short | Germline contamination and leakage in whole genome somatic single nucleotide variant detection |
title_sort | germline contamination and leakage in whole genome somatic single nucleotide variant detection |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793408/ https://www.ncbi.nlm.nih.gov/pubmed/29385983 http://dx.doi.org/10.1186/s12859-018-2046-0 |
work_keys_str_mv | AT sendorekdorotah germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT caloiancristian germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT ellrottkyle germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT barejchristopher germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT yamaguchitakafumin germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT ewingadamd germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT houlahankathleene germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT normantheac germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT margolinadama germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT stuartjoshuam germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection AT boutrospaulc germlinecontaminationandleakageinwholegenomesomaticsinglenucleotidevariantdetection |