Cargando…

The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies

BACKGROUND: Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Weiwen, Das, Ashutosh, Kainer, David, Schalamun, Miriam, Morales-Suarez, Alejandro, Schwessinger, Benjamin, Lanfear, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6939829/
https://www.ncbi.nlm.nih.gov/pubmed/31895413
http://dx.doi.org/10.1093/gigascience/giz160
_version_ 1783484262800949248
author Wang, Weiwen
Das, Ashutosh
Kainer, David
Schalamun, Miriam
Morales-Suarez, Alejandro
Schwessinger, Benjamin
Lanfear, Robert
author_facet Wang, Weiwen
Das, Ashutosh
Kainer, David
Schalamun, Miriam
Morales-Suarez, Alejandro
Schwessinger, Benjamin
Lanfear, Robert
author_sort Wang, Weiwen
collection PubMed
description BACKGROUND: Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. FINDINGS: We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base. CONCLUSIONS: We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset.
format Online
Article
Text
id pubmed-6939829
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69398292020-01-07 The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies Wang, Weiwen Das, Ashutosh Kainer, David Schalamun, Miriam Morales-Suarez, Alejandro Schwessinger, Benjamin Lanfear, Robert Gigascience Data Note BACKGROUND: Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. FINDINGS: We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base. CONCLUSIONS: We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset. Oxford University Press 2020-01-02 /pmc/articles/PMC6939829/ /pubmed/31895413 http://dx.doi.org/10.1093/gigascience/giz160 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Wang, Weiwen
Das, Ashutosh
Kainer, David
Schalamun, Miriam
Morales-Suarez, Alejandro
Schwessinger, Benjamin
Lanfear, Robert
The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
title The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
title_full The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
title_fullStr The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
title_full_unstemmed The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
title_short The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies
title_sort draft nuclear genome assembly of eucalyptus pauciflora: a pipeline for comparing de novo assemblies
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6939829/
https://www.ncbi.nlm.nih.gov/pubmed/31895413
http://dx.doi.org/10.1093/gigascience/giz160
work_keys_str_mv AT wangweiwen thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT dasashutosh thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT kainerdavid thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT schalamunmiriam thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT moralessuarezalejandro thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT schwessingerbenjamin thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT lanfearrobert thedraftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT wangweiwen draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT dasashutosh draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT kainerdavid draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT schalamunmiriam draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT moralessuarezalejandro draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT schwessingerbenjamin draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies
AT lanfearrobert draftnucleargenomeassemblyofeucalyptuspaucifloraapipelineforcomparingdenovoassemblies