Cargando…
Quinoa genome assembly employing genomic variation for guided scaffolding
KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as we...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519820/ https://www.ncbi.nlm.nih.gov/pubmed/34365519 http://dx.doi.org/10.1007/s00122-021-03915-x |
_version_ | 1784584533287370752 |
---|---|
author | Bodrug-Schepers, Alexandrina Stralis-Pavese, Nancy Buerstmayr, Hermann Dohm, Juliane C. Himmelbauer, Heinz |
author_facet | Bodrug-Schepers, Alexandrina Stralis-Pavese, Nancy Buerstmayr, Hermann Dohm, Juliane C. Himmelbauer, Heinz |
author_sort | Bodrug-Schepers, Alexandrina |
collection | PubMed |
description | KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. ABSTRACT: Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-021-03915-x. |
format | Online Article Text |
id | pubmed-8519820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-85198202021-10-29 Quinoa genome assembly employing genomic variation for guided scaffolding Bodrug-Schepers, Alexandrina Stralis-Pavese, Nancy Buerstmayr, Hermann Dohm, Juliane C. Himmelbauer, Heinz Theor Appl Genet Original Article KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. ABSTRACT: Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-021-03915-x. Springer Berlin Heidelberg 2021-08-07 2021 /pmc/articles/PMC8519820/ /pubmed/34365519 http://dx.doi.org/10.1007/s00122-021-03915-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Article Bodrug-Schepers, Alexandrina Stralis-Pavese, Nancy Buerstmayr, Hermann Dohm, Juliane C. Himmelbauer, Heinz Quinoa genome assembly employing genomic variation for guided scaffolding |
title | Quinoa genome assembly employing genomic variation for guided scaffolding |
title_full | Quinoa genome assembly employing genomic variation for guided scaffolding |
title_fullStr | Quinoa genome assembly employing genomic variation for guided scaffolding |
title_full_unstemmed | Quinoa genome assembly employing genomic variation for guided scaffolding |
title_short | Quinoa genome assembly employing genomic variation for guided scaffolding |
title_sort | quinoa genome assembly employing genomic variation for guided scaffolding |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519820/ https://www.ncbi.nlm.nih.gov/pubmed/34365519 http://dx.doi.org/10.1007/s00122-021-03915-x |
work_keys_str_mv | AT bodrugschepersalexandrina quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding AT stralispavesenancy quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding AT buerstmayrhermann quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding AT dohmjulianec quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding AT himmelbauerheinz quinoagenomeassemblyemployinggenomicvariationforguidedscaffolding |