Cargando…

Species-specific basecallers improve actual accuracy of nanopore sequencing in plants

BACKGROUND: Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferguson, Scott, McLay, Todd, Andrew, Rose L., Bruhl, Jeremy J., Schwessinger, Benjamin, Borevitz, Justin, Jones, Ashley
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9749173/
https://www.ncbi.nlm.nih.gov/pubmed/36517904
http://dx.doi.org/10.1186/s13007-022-00971-2
_version_ 1784849987637608448
author Ferguson, Scott
McLay, Todd
Andrew, Rose L.
Bruhl, Jeremy J.
Schwessinger, Benjamin
Borevitz, Justin
Jones, Ashley
author_facet Ferguson, Scott
McLay, Todd
Andrew, Rose L.
Bruhl, Jeremy J.
Schwessinger, Benjamin
Borevitz, Justin
Jones, Ashley
author_sort Ferguson, Scott
collection PubMed
description BACKGROUND: Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT’s improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads. RESULTS: Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific). CONCLUSIONS: The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-022-00971-2.
format Online
Article
Text
id pubmed-9749173
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97491732022-12-15 Species-specific basecallers improve actual accuracy of nanopore sequencing in plants Ferguson, Scott McLay, Todd Andrew, Rose L. Bruhl, Jeremy J. Schwessinger, Benjamin Borevitz, Justin Jones, Ashley Plant Methods Methodology BACKGROUND: Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT’s improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads. RESULTS: Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific). CONCLUSIONS: The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-022-00971-2. BioMed Central 2022-12-14 /pmc/articles/PMC9749173/ /pubmed/36517904 http://dx.doi.org/10.1186/s13007-022-00971-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Ferguson, Scott
McLay, Todd
Andrew, Rose L.
Bruhl, Jeremy J.
Schwessinger, Benjamin
Borevitz, Justin
Jones, Ashley
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
title Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
title_full Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
title_fullStr Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
title_full_unstemmed Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
title_short Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
title_sort species-specific basecallers improve actual accuracy of nanopore sequencing in plants
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9749173/
https://www.ncbi.nlm.nih.gov/pubmed/36517904
http://dx.doi.org/10.1186/s13007-022-00971-2
work_keys_str_mv AT fergusonscott speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants
AT mclaytodd speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants
AT andrewrosel speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants
AT bruhljeremyj speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants
AT schwessingerbenjamin speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants
AT borevitzjustin speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants
AT jonesashley speciesspecificbasecallersimproveactualaccuracyofnanoporesequencinginplants