Cargando…

Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes

Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads un...

Descripción completa

Detalles Bibliográficos
Autores principales: Rabanal, Fernando A, Gräff, Maike, Lanz, Christa, Fritschi, Katrin, Llaca, Victor, Lang, Michelle, Carbonell-Bejerano, Pablo, Henderson, Ian, Weigel, Detlef
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9757041/
https://www.ncbi.nlm.nih.gov/pubmed/36453992
http://dx.doi.org/10.1093/nar/gkac1115
_version_ 1784851745941225472
author Rabanal, Fernando A
Gräff, Maike
Lanz, Christa
Fritschi, Katrin
Llaca, Victor
Lang, Michelle
Carbonell-Bejerano, Pablo
Henderson, Ian
Weigel, Detlef
author_facet Rabanal, Fernando A
Gräff, Maike
Lanz, Christa
Fritschi, Katrin
Llaca, Victor
Lang, Michelle
Carbonell-Bejerano, Pablo
Henderson, Ian
Weigel, Detlef
author_sort Rabanal, Fernando A
collection PubMed
description Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.
format Online
Article
Text
id pubmed-9757041
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97570412022-12-19 Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes Rabanal, Fernando A Gräff, Maike Lanz, Christa Fritschi, Katrin Llaca, Victor Lang, Michelle Carbonell-Bejerano, Pablo Henderson, Ian Weigel, Detlef Nucleic Acids Res Genomics Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0. Oxford University Press 2022-12-01 /pmc/articles/PMC9757041/ /pubmed/36453992 http://dx.doi.org/10.1093/nar/gkac1115 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Rabanal, Fernando A
Gräff, Maike
Lanz, Christa
Fritschi, Katrin
Llaca, Victor
Lang, Michelle
Carbonell-Bejerano, Pablo
Henderson, Ian
Weigel, Detlef
Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes
title Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes
title_full Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes
title_fullStr Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes
title_full_unstemmed Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes
title_short Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes
title_sort pushing the limits of hifi assemblies reveals centromere diversity between two arabidopsis thaliana genomes
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9757041/
https://www.ncbi.nlm.nih.gov/pubmed/36453992
http://dx.doi.org/10.1093/nar/gkac1115
work_keys_str_mv AT rabanalfernandoa pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT graffmaike pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT lanzchrista pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT fritschikatrin pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT llacavictor pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT langmichelle pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT carbonellbejeranopablo pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT hendersonian pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes
AT weigeldetlef pushingthelimitsofhifiassembliesrevealscentromerediversitybetweentwoarabidopsisthalianagenomes