Cargando…
DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data
De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9508836/ https://www.ncbi.nlm.nih.gov/pubmed/35713566 http://dx.doi.org/10.1093/nar/gkac511 |
_version_ | 1784797105305419776 |
---|---|
author | Khazeeva, Gelana Sablauskas, Karolis van der Sanden, Bart Steyaert, Wouter Kwint, Michael Rots, Dmitrijs Hinne, Max van Gerven, Marcel Yntema, Helger Vissers, Lisenka Gilissen, Christian |
author_facet | Khazeeva, Gelana Sablauskas, Karolis van der Sanden, Bart Steyaert, Wouter Kwint, Michael Rots, Dmitrijs Hinne, Max van Gerven, Marcel Yntema, Helger Vissers, Lisenka Gilissen, Christian |
author_sort | Khazeeva, Gelana |
collection | PubMed |
description | De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160 [Formula: see text] 164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling. |
format | Online Article Text |
id | pubmed-9508836 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-95088362022-09-26 DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data Khazeeva, Gelana Sablauskas, Karolis van der Sanden, Bart Steyaert, Wouter Kwint, Michael Rots, Dmitrijs Hinne, Max van Gerven, Marcel Yntema, Helger Vissers, Lisenka Gilissen, Christian Nucleic Acids Res Methods Online De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160 [Formula: see text] 164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling. Oxford University Press 2022-06-17 /pmc/articles/PMC9508836/ /pubmed/35713566 http://dx.doi.org/10.1093/nar/gkac511 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Khazeeva, Gelana Sablauskas, Karolis van der Sanden, Bart Steyaert, Wouter Kwint, Michael Rots, Dmitrijs Hinne, Max van Gerven, Marcel Yntema, Helger Vissers, Lisenka Gilissen, Christian DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data |
title | DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data |
title_full | DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data |
title_fullStr | DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data |
title_full_unstemmed | DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data |
title_short | DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data |
title_sort | denovocnn: a deep learning approach to de novo variant calling in next generation sequencing data |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9508836/ https://www.ncbi.nlm.nih.gov/pubmed/35713566 http://dx.doi.org/10.1093/nar/gkac511 |
work_keys_str_mv | AT khazeevagelana denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT sablauskaskarolis denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT vandersandenbart denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT steyaertwouter denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT kwintmichael denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT rotsdmitrijs denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT hinnemax denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT vangervenmarcel denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT yntemahelger denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT visserslisenka denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata AT gilissenchristian denovocnnadeeplearningapproachtodenovovariantcallinginnextgenerationsequencingdata |