Cargando…

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise

Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich a...

Descripción completa

Detalles Bibliográficos
Autores principales: Peona, Valentina, Blom, Mozes P. K., Xu, Luohao, Burri, Reto, Sullivan, Shawn, Bunikis, Ignas, Liachko, Ivan, Haryoko, Tri, Jønsson, Knud A., Zhou, Qi, Irestedt, Martin, Suh, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7757076/
https://www.ncbi.nlm.nih.gov/pubmed/32937018
http://dx.doi.org/10.1111/1755-0998.13252
_version_ 1783626672381100032
author Peona, Valentina
Blom, Mozes P. K.
Xu, Luohao
Burri, Reto
Sullivan, Shawn
Bunikis, Ignas
Liachko, Ivan
Haryoko, Tri
Jønsson, Knud A.
Zhou, Qi
Irestedt, Martin
Suh, Alexander
author_facet Peona, Valentina
Blom, Mozes P. K.
Xu, Luohao
Burri, Reto
Sullivan, Shawn
Bunikis, Ignas
Liachko, Ivan
Haryoko, Tri
Jønsson, Knud A.
Zhou, Qi
Irestedt, Martin
Suh, Alexander
author_sort Peona, Valentina
collection PubMed
description Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
format Online
Article
Text
id pubmed-7757076
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-77570762020-12-28 Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise Peona, Valentina Blom, Mozes P. K. Xu, Luohao Burri, Reto Sullivan, Shawn Bunikis, Ignas Liachko, Ivan Haryoko, Tri Jønsson, Knud A. Zhou, Qi Irestedt, Martin Suh, Alexander Mol Ecol Resour RESOURCE ARTICLES Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes. John Wiley and Sons Inc. 2020-10-10 2021-01 /pmc/articles/PMC7757076/ /pubmed/32937018 http://dx.doi.org/10.1111/1755-0998.13252 Text en © 2020 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle RESOURCE ARTICLES
Peona, Valentina
Blom, Mozes P. K.
Xu, Luohao
Burri, Reto
Sullivan, Shawn
Bunikis, Ignas
Liachko, Ivan
Haryoko, Tri
Jønsson, Knud A.
Zhou, Qi
Irestedt, Martin
Suh, Alexander
Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
title Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
title_full Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
title_fullStr Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
title_full_unstemmed Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
title_short Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
title_sort identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise
topic RESOURCE ARTICLES
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7757076/
https://www.ncbi.nlm.nih.gov/pubmed/32937018
http://dx.doi.org/10.1111/1755-0998.13252
work_keys_str_mv AT peonavalentina identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT blommozespk identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT xuluohao identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT burrireto identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT sullivanshawn identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT bunikisignas identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT liachkoivan identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT haryokotri identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT jønssonknuda identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT zhouqi identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT irestedtmartin identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise
AT suhalexander identifyingthecausesandconsequencesofassemblygapsusingamultiplatformgenomeassemblyofabirdofparadise