Cargando…

Quinoa genome assembly employing genomic variation for guided scaffolding

KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as we...

Descripción completa

Detalles Bibliográficos
Autores principales: Bodrug-Schepers, Alexandrina, Stralis-Pavese, Nancy, Buerstmayr, Hermann, Dohm, Juliane C., Himmelbauer, Heinz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519820/
https://www.ncbi.nlm.nih.gov/pubmed/34365519
http://dx.doi.org/10.1007/s00122-021-03915-x
_version_ 1784584533287370752
author Bodrug-Schepers, Alexandrina
Stralis-Pavese, Nancy
Buerstmayr, Hermann
Dohm, Juliane C.
Himmelbauer, Heinz
author_facet Bodrug-Schepers, Alexandrina
Stralis-Pavese, Nancy
Buerstmayr, Hermann
Dohm, Juliane C.
Himmelbauer, Heinz
author_sort Bodrug-Schepers, Alexandrina
collection PubMed
description KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. ABSTRACT: Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-021-03915-x.
format Online
Article
Text
id pubmed-8519820
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-85198202021-10-29 Quinoa genome assembly employing genomic variation for guided scaffolding Bodrug-Schepers, Alexandrina Stralis-Pavese, Nancy Buerstmayr, Hermann Dohm, Juliane C. Himmelbauer, Heinz Theor Appl Genet Original Article KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. ABSTRACT: Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-021-03915-x. Springer Berlin Heidelberg 2021-08-07 2021 /pmc/articles/PMC8519820/ /pubmed/34365519 http://dx.doi.org/10.1007/s00122-021-03915-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Article
Bodrug-Schepers, Alexandrina
Stralis-Pavese, Nancy
Buerstmayr, Hermann
Dohm, Juliane C.
Himmelbauer, Heinz
Quinoa genome assembly employing genomic variation for guided scaffolding
title Quinoa genome assembly employing genomic variation for guided scaffolding
title_full Quinoa genome assembly employing genomic variation for guided scaffolding
title_fullStr Quinoa genome assembly employing genomic variation for guided scaffolding
title_full_unstemmed Quinoa genome assembly employing genomic variation for guided scaffolding
title_short Quinoa genome assembly employing genomic variation for guided scaffolding
title_sort quinoa genome assembly employing genomic variation for guided scaffolding
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519820/
https://www.ncbi.nlm.nih.gov/pubmed/34365519
http://dx.doi.org/10.1007/s00122-021-03915-x
work_keys_str_mv AT bodrugschepersalexandrina quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding
AT stralispavesenancy quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding
AT buerstmayrhermann quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding
AT dohmjulianec quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding
AT himmelbauerheinz quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding