Cargando…

Performance of neural network basecalling tools for Oxford Nanopore sequencing

BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy a...

Descripción completa

Detalles Bibliográficos
Autores principales: Wick, Ryan R., Judd, Louise M., Holt, Kathryn E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6591954/
https://www.ncbi.nlm.nih.gov/pubmed/31234903
http://dx.doi.org/10.1186/s13059-019-1727-y
_version_ 1783429814524313600
author Wick, Ryan R.
Judd, Louise M.
Holt, Kathryn E.
author_facet Wick, Ryan R.
Judd, Louise M.
Holt, Kathryn E.
author_sort Wick, Ryan R.
collection PubMed
description BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. RESULTS: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. CONCLUSIONS: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1727-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6591954
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65919542019-07-08 Performance of neural network basecalling tools for Oxford Nanopore sequencing Wick, Ryan R. Judd, Louise M. Holt, Kathryn E. Genome Biol Research BACKGROUND: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. RESULTS: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. CONCLUSIONS: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1727-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-24 /pmc/articles/PMC6591954/ /pubmed/31234903 http://dx.doi.org/10.1186/s13059-019-1727-y Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wick, Ryan R.
Judd, Louise M.
Holt, Kathryn E.
Performance of neural network basecalling tools for Oxford Nanopore sequencing
title Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_full Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_fullStr Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_full_unstemmed Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_short Performance of neural network basecalling tools for Oxford Nanopore sequencing
title_sort performance of neural network basecalling tools for oxford nanopore sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6591954/
https://www.ncbi.nlm.nih.gov/pubmed/31234903
http://dx.doi.org/10.1186/s13059-019-1727-y
work_keys_str_mv AT wickryanr performanceofneuralnetworkbasecallingtoolsforoxfordnanoporesequencing
AT juddlouisem performanceofneuralnetworkbasecallingtoolsforoxfordnanoporesequencing
AT holtkathryne performanceofneuralnetworkbasecallingtoolsforoxfordnanoporesequencing