Cargando…

Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki

For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phas...

Descripción completa

Detalles Bibliográficos
Autores principales: Rayamajhi, Niraj, Cheng, Chi-Hing Christina, Catchen, Julian M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635638/
https://www.ncbi.nlm.nih.gov/pubmed/35904764
http://dx.doi.org/10.1093/g3journal/jkac192
_version_ 1784824751843180544
author Rayamajhi, Niraj
Cheng, Chi-Hing Christina
Catchen, Julian M
author_facet Rayamajhi, Niraj
Cheng, Chi-Hing Christina
Catchen, Julian M
author_sort Rayamajhi, Niraj
collection PubMed
description For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phases has its own error model. We hypothesized that hidden short-read scaffolding errors and erroneous long-read contigs degrade the quality of short- and long-read hybrid assemblies. We assembled the genome of Trematomus borchgrevinki from data generated during each of the 3 phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer-based strategy improved short-read assemblies as measured by Benchmarking Universal Single-Copy Ortholog while mate-pair libraries introduced hidden scaffolding errors and perturbed Benchmarking Universal Single-Copy Ortholog scores. Furthermore, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read-only assemblies can be optimized for contiguity by subsampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality.
format Online
Article
Text
id pubmed-9635638
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96356382022-11-07 Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki Rayamajhi, Niraj Cheng, Chi-Hing Christina Catchen, Julian M G3 (Bethesda) Investigation For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phases has its own error model. We hypothesized that hidden short-read scaffolding errors and erroneous long-read contigs degrade the quality of short- and long-read hybrid assemblies. We assembled the genome of Trematomus borchgrevinki from data generated during each of the 3 phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer-based strategy improved short-read assemblies as measured by Benchmarking Universal Single-Copy Ortholog while mate-pair libraries introduced hidden scaffolding errors and perturbed Benchmarking Universal Single-Copy Ortholog scores. Furthermore, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read-only assemblies can be optimized for contiguity by subsampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality. Oxford University Press 2022-07-29 /pmc/articles/PMC9635638/ /pubmed/35904764 http://dx.doi.org/10.1093/g3journal/jkac192 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Rayamajhi, Niraj
Cheng, Chi-Hing Christina
Catchen, Julian M
Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
title Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
title_full Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
title_fullStr Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
title_full_unstemmed Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
title_short Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
title_sort evaluating illumina-, nanopore-, and pacbio-based genome assembly strategies with the bald notothen, trematomus borchgrevinki
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635638/
https://www.ncbi.nlm.nih.gov/pubmed/35904764
http://dx.doi.org/10.1093/g3journal/jkac192
work_keys_str_mv AT rayamajhiniraj evaluatingilluminananoporeandpacbiobasedgenomeassemblystrategieswiththebaldnotothentrematomusborchgrevinki
AT chengchihingchristina evaluatingilluminananoporeandpacbiobasedgenomeassemblystrategieswiththebaldnotothentrematomusborchgrevinki
AT catchenjulianm evaluatingilluminananoporeandpacbiobasedgenomeassemblystrategieswiththebaldnotothentrematomusborchgrevinki