Cargando…

A multi-task convolutional deep neural network for variant calling in single molecule sequencing

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5–15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Ruibang, Sedlazeck, Fritz J., Lam, Tak-Wah, Schatz, Michael C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397153/
https://www.ncbi.nlm.nih.gov/pubmed/30824707
http://dx.doi.org/10.1038/s41467-019-09025-z
_version_ 1783399368947138560
author Luo, Ruibang
Sedlazeck, Fritz J.
Lam, Tak-Wah
Schatz, Michael C.
author_facet Luo, Ruibang
Sedlazeck, Fritz J.
Lam, Tak-Wah
Schatz, Michael C.
author_sort Luo, Ruibang
collection PubMed
description The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5–15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.
format Online
Article
Text
id pubmed-6397153
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-63971532019-03-04 A multi-task convolutional deep neural network for variant calling in single molecule sequencing Luo, Ruibang Sedlazeck, Fritz J. Lam, Tak-Wah Schatz, Michael C. Nat Commun Article The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5–15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model. Nature Publishing Group UK 2019-03-01 /pmc/articles/PMC6397153/ /pubmed/30824707 http://dx.doi.org/10.1038/s41467-019-09025-z Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Luo, Ruibang
Sedlazeck, Fritz J.
Lam, Tak-Wah
Schatz, Michael C.
A multi-task convolutional deep neural network for variant calling in single molecule sequencing
title A multi-task convolutional deep neural network for variant calling in single molecule sequencing
title_full A multi-task convolutional deep neural network for variant calling in single molecule sequencing
title_fullStr A multi-task convolutional deep neural network for variant calling in single molecule sequencing
title_full_unstemmed A multi-task convolutional deep neural network for variant calling in single molecule sequencing
title_short A multi-task convolutional deep neural network for variant calling in single molecule sequencing
title_sort multi-task convolutional deep neural network for variant calling in single molecule sequencing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6397153/
https://www.ncbi.nlm.nih.gov/pubmed/30824707
http://dx.doi.org/10.1038/s41467-019-09025-z
work_keys_str_mv AT luoruibang amultitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT sedlazeckfritzj amultitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT lamtakwah amultitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT schatzmichaelc amultitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT luoruibang multitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT sedlazeckfritzj multitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT lamtakwah multitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing
AT schatzmichaelc multitaskconvolutionaldeepneuralnetworkforvariantcallinginsinglemoleculesequencing