Cargando…

DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data

De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage...

Descripción completa

Detalles Bibliográficos
Autores principales: Khazeeva, Gelana, Sablauskas, Karolis, van der Sanden, Bart, Steyaert, Wouter, Kwint, Michael, Rots, Dmitrijs, Hinne, Max, van Gerven, Marcel, Yntema, Helger, Vissers, Lisenka, Gilissen, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9508836/
https://www.ncbi.nlm.nih.gov/pubmed/35713566
http://dx.doi.org/10.1093/nar/gkac511
_version_ 1784797105305419776
author Khazeeva, Gelana
Sablauskas, Karolis
van der Sanden, Bart
Steyaert, Wouter
Kwint, Michael
Rots, Dmitrijs
Hinne, Max
van Gerven, Marcel
Yntema, Helger
Vissers, Lisenka
Gilissen, Christian
author_facet Khazeeva, Gelana
Sablauskas, Karolis
van der Sanden, Bart
Steyaert, Wouter
Kwint, Michael
Rots, Dmitrijs
Hinne, Max
van Gerven, Marcel
Yntema, Helger
Vissers, Lisenka
Gilissen, Christian
author_sort Khazeeva, Gelana
collection PubMed
description De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160 [Formula: see text] 164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.
format Online
Article
Text
id pubmed-9508836
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95088362022-09-26 DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data Khazeeva, Gelana Sablauskas, Karolis van der Sanden, Bart Steyaert, Wouter Kwint, Michael Rots, Dmitrijs Hinne, Max van Gerven, Marcel Yntema, Helger Vissers, Lisenka Gilissen, Christian Nucleic Acids Res Methods Online De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160 [Formula: see text] 164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling. Oxford University Press 2022-06-17 /pmc/articles/PMC9508836/ /pubmed/35713566 http://dx.doi.org/10.1093/nar/gkac511 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Khazeeva, Gelana
Sablauskas, Karolis
van der Sanden, Bart
Steyaert, Wouter
Kwint, Michael
Rots, Dmitrijs
Hinne, Max
van Gerven, Marcel
Yntema, Helger
Vissers, Lisenka
Gilissen, Christian
DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
title DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
title_full DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
title_fullStr DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
title_full_unstemmed DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
title_short DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
title_sort denovocnn: a deep learning approach to de novo variant calling in next generation sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9508836/
https://www.ncbi.nlm.nih.gov/pubmed/35713566
http://dx.doi.org/10.1093/nar/gkac511
work_keys_str_mv AT khazeevagelana denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT sablauskaskarolis denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT vandersandenbart denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT steyaertwouter denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT kwintmichael denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT rotsdmitrijs denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT hinnemax denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT vangervenmarcel denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT yntemahelger denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT visserslisenka denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata
AT gilissenchristian denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata