Cargando…

Constructing germline research cohorts from the discarded reads of clinical tumor sequences

BACKGROUND: Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of...

Descripción completa

Detalles Bibliográficos
Autores principales: Gusev, Alexander, Groha, Stefan, Taraszka, Kodi, Semenov, Yevgeniy R., Zaitlen, Noah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8576948/
https://www.ncbi.nlm.nih.gov/pubmed/34749793
http://dx.doi.org/10.1186/s13073-021-00999-4
_version_ 1784595980700614656
author Gusev, Alexander
Groha, Stefan
Taraszka, Kodi
Semenov, Yevgeniy R.
Zaitlen, Noah
author_facet Gusev, Alexander
Groha, Stefan
Taraszka, Kodi
Semenov, Yevgeniy R.
Zaitlen, Noah
author_sort Gusev, Alexander
collection PubMed
description BACKGROUND: Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for the recovery of genome-wide germline genotypes through imputation. METHODS: We developed a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmarked our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then applied our approach to a prospectively collected panel sequencing cohort of 25,889 tumors. RESULTS: We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86, genetic ancestry was inferred with a correlation of > 0.98, polygenic risk scores were inferred with a correlation of > 0.90, and individual HLA alleles were inferred with a correlation of > 0.80. We demonstrate a minimal influence on the accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying the relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. CONCLUSIONS: We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data and make our analysis pipeline publicly available to facilitate this effort. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-021-00999-4.
format Online
Article
Text
id pubmed-8576948
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85769482021-11-10 Constructing germline research cohorts from the discarded reads of clinical tumor sequences Gusev, Alexander Groha, Stefan Taraszka, Kodi Semenov, Yevgeniy R. Zaitlen, Noah Genome Med Research BACKGROUND: Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for the recovery of genome-wide germline genotypes through imputation. METHODS: We developed a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmarked our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then applied our approach to a prospectively collected panel sequencing cohort of 25,889 tumors. RESULTS: We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86, genetic ancestry was inferred with a correlation of > 0.98, polygenic risk scores were inferred with a correlation of > 0.90, and individual HLA alleles were inferred with a correlation of > 0.80. We demonstrate a minimal influence on the accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying the relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. CONCLUSIONS: We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data and make our analysis pipeline publicly available to facilitate this effort. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13073-021-00999-4. BioMed Central 2021-11-08 /pmc/articles/PMC8576948/ /pubmed/34749793 http://dx.doi.org/10.1186/s13073-021-00999-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Gusev, Alexander
Groha, Stefan
Taraszka, Kodi
Semenov, Yevgeniy R.
Zaitlen, Noah
Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_full Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_fullStr Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_full_unstemmed Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_short Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_sort constructing germline research cohorts from the discarded reads of clinical tumor sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8576948/
https://www.ncbi.nlm.nih.gov/pubmed/34749793
http://dx.doi.org/10.1186/s13073-021-00999-4
work_keys_str_mv AT gusevalexander constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences
AT grohastefan constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences
AT taraszkakodi constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences
AT semenovyevgeniyr constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences
AT zaitlennoah constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences