Cargando…

Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid developmen...

Descripción completa

Detalles Bibliográficos
Autores principales: Gavrielatos, Marios, Kyriakidis, Konstantinos, Spandidos, Demetrios A., Michalopoulos, Ioannis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: D.A. Spandidos 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7893683/
https://www.ncbi.nlm.nih.gov/pubmed/33537807
http://dx.doi.org/10.3892/mmr.2021.11890
_version_ 1783653095109033984
author Gavrielatos, Marios
Kyriakidis, Konstantinos
Spandidos, Demetrios A.
Michalopoulos, Ioannis
author_facet Gavrielatos, Marios
Kyriakidis, Konstantinos
Spandidos, Demetrios A.
Michalopoulos, Ioannis
author_sort Gavrielatos, Marios
collection PubMed
description Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel de novo genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired-end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long-read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi-C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi-C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.
format Online
Article
Text
id pubmed-7893683
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher D.A. Spandidos
record_format MEDLINE/PubMed
spelling pubmed-78936832021-03-08 Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly Gavrielatos, Marios Kyriakidis, Konstantinos Spandidos, Demetrios A. Michalopoulos, Ioannis Mol Med Rep Articles Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel de novo genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired-end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long-read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi-C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi-C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources. D.A. Spandidos 2021-04 2021-02-02 /pmc/articles/PMC7893683/ /pubmed/33537807 http://dx.doi.org/10.3892/mmr.2021.11890 Text en Copyright: © Gavrielatos et al. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
spellingShingle Articles
Gavrielatos, Marios
Kyriakidis, Konstantinos
Spandidos, Demetrios A.
Michalopoulos, Ioannis
Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
title Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
title_full Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
title_fullStr Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
title_full_unstemmed Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
title_short Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
title_sort benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7893683/
https://www.ncbi.nlm.nih.gov/pubmed/33537807
http://dx.doi.org/10.3892/mmr.2021.11890
work_keys_str_mv AT gavrielatosmarios benchmarkingofnextandthirdgenerationsequencingtechnologiesandtheirassociatedalgorithmsfordenovogenomeassembly
AT kyriakidiskonstantinos benchmarkingofnextandthirdgenerationsequencingtechnologiesandtheirassociatedalgorithmsfordenovogenomeassembly
AT spandidosdemetriosa benchmarkingofnextandthirdgenerationsequencingtechnologiesandtheirassociatedalgorithmsfordenovogenomeassembly
AT michalopoulosioannis benchmarkingofnextandthirdgenerationsequencingtechnologiesandtheirassociatedalgorithmsfordenovogenomeassembly