Cargando…

Polishing copy number variant calls on exome sequencing data via deep learning

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy numbe...

Descripción completa

Detalles Bibliográficos
Autores principales: Özden, Furkan, Alkan, Can, Çiçek, A. Ercüment
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9248885/
https://www.ncbi.nlm.nih.gov/pubmed/35697522
http://dx.doi.org/10.1101/gr.274845.120
_version_ 1784739450692042752
author Özden, Furkan
Alkan, Can
Çiçek, A. Ercüment
author_facet Özden, Furkan
Alkan, Can
Çiçek, A. Ercüment
author_sort Özden, Furkan
collection PubMed
description Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.
format Online
Article
Text
id pubmed-9248885
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-92488852022-12-01 Polishing copy number variant calls on exome sequencing data via deep learning Özden, Furkan Alkan, Can Çiçek, A. Ercüment Genome Res Method Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets. Cold Spring Harbor Laboratory Press 2022-06 /pmc/articles/PMC9248885/ /pubmed/35697522 http://dx.doi.org/10.1101/gr.274845.120 Text en © 2022 Özden et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Method
Özden, Furkan
Alkan, Can
Çiçek, A. Ercüment
Polishing copy number variant calls on exome sequencing data via deep learning
title Polishing copy number variant calls on exome sequencing data via deep learning
title_full Polishing copy number variant calls on exome sequencing data via deep learning
title_fullStr Polishing copy number variant calls on exome sequencing data via deep learning
title_full_unstemmed Polishing copy number variant calls on exome sequencing data via deep learning
title_short Polishing copy number variant calls on exome sequencing data via deep learning
title_sort polishing copy number variant calls on exome sequencing data via deep learning
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9248885/
https://www.ncbi.nlm.nih.gov/pubmed/35697522
http://dx.doi.org/10.1101/gr.274845.120
work_keys_str_mv AT ozdenfurkan polishingcopynumbervariantcallsonexomesequencingdataviadeeplearning
AT alkancan polishingcopynumbervariantcallsonexomesequencingdataviadeeplearning
AT cicekaercument polishingcopynumbervariantcallsonexomesequencingdataviadeeplearning