Cargando…

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wick, Ryan R., Holt, Kathryn E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000 Research Limited 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6966772/ https://www.ncbi.nlm.nih.gov/pubmed/31984131 http://dx.doi.org/10.12688/f1000research.21782.4

_version_	1783488811913707520
author	Wick, Ryan R. Holt, Kathryn E.
author_facet	Wick, Ryan R. Holt, Kathryn E.
author_sort	Wick, Ryan R.
collection	PubMed
description	Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
format	Online Article Text
id	pubmed-6966772
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	F1000 Research Limited
record_format	MEDLINE/PubMed
spelling	pubmed-69667722020-01-23 Benchmarking of long-read assemblers for prokaryote whole genome sequencing Wick, Ryan R. Holt, Kathryn E. F1000Res Research Article Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms. F1000 Research Limited 2021-02-01 /pmc/articles/PMC6966772/ /pubmed/31984131 http://dx.doi.org/10.12688/f1000research.21782.4 Text en Copyright: © 2021 Wick RR and Holt KE http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Wick, Ryan R. Holt, Kathryn E. Benchmarking of long-read assemblers for prokaryote whole genome sequencing
title	Benchmarking of long-read assemblers for prokaryote whole genome sequencing
title_full	Benchmarking of long-read assemblers for prokaryote whole genome sequencing
title_fullStr	Benchmarking of long-read assemblers for prokaryote whole genome sequencing
title_full_unstemmed	Benchmarking of long-read assemblers for prokaryote whole genome sequencing
title_short	Benchmarking of long-read assemblers for prokaryote whole genome sequencing
title_sort	benchmarking of long-read assemblers for prokaryote whole genome sequencing
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6966772/ https://www.ncbi.nlm.nih.gov/pubmed/31984131 http://dx.doi.org/10.12688/f1000research.21782.4
work_keys_str_mv	AT wickryanr benchmarkingoflongreadassemblersforprokaryotewholegenomesequencing AT holtkathryne benchmarkingoflongreadassemblersforprokaryotewholegenomesequencing

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

Ejemplares similares