Cargando…

Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries

BACKGROUND: The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been par...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Fu-Hao, McKenzie, Neil, Kettleborough, George, Heavens, Darren, Clark, Matthew D, Bevan, Michael W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5967450/
https://www.ncbi.nlm.nih.gov/pubmed/29762659
http://dx.doi.org/10.1093/gigascience/giy053
_version_ 1783325606716375040
author Lu, Fu-Hao
McKenzie, Neil
Kettleborough, George
Heavens, Darren
Clark, Matthew D
Bevan, Michael W
author_facet Lu, Fu-Hao
McKenzie, Neil
Kettleborough, George
Heavens, Darren
Clark, Matthew D
Bevan, Michael W
author_sort Lu, Fu-Hao
collection PubMed
description BACKGROUND: The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. RESULTS: Here we use a Fosill 38-kb jumping library to assess medium and longer–range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. CONCLUSIONS: Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.
format Online
Article
Text
id pubmed-5967450
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59674502018-06-04 Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries Lu, Fu-Hao McKenzie, Neil Kettleborough, George Heavens, Darren Clark, Matthew D Bevan, Michael W Gigascience Research BACKGROUND: The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. RESULTS: Here we use a Fosill 38-kb jumping library to assess medium and longer–range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. CONCLUSIONS: Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods. Oxford University Press 2018-05-11 /pmc/articles/PMC5967450/ /pubmed/29762659 http://dx.doi.org/10.1093/gigascience/giy053 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Lu, Fu-Hao
McKenzie, Neil
Kettleborough, George
Heavens, Darren
Clark, Matthew D
Bevan, Michael W
Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries
title Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries
title_full Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries
title_fullStr Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries
title_full_unstemmed Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries
title_short Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries
title_sort independent assessment and improvement of wheat genome sequence assemblies using fosill jumping libraries
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5967450/
https://www.ncbi.nlm.nih.gov/pubmed/29762659
http://dx.doi.org/10.1093/gigascience/giy053
work_keys_str_mv AT lufuhao independentassessmentandimprovementofwheatgenomesequenceassembliesusingfosilljumpinglibraries
AT mckenzieneil independentassessmentandimprovementofwheatgenomesequenceassembliesusingfosilljumpinglibraries
AT kettleboroughgeorge independentassessmentandimprovementofwheatgenomesequenceassembliesusingfosilljumpinglibraries
AT heavensdarren independentassessmentandimprovementofwheatgenomesequenceassembliesusingfosilljumpinglibraries
AT clarkmatthewd independentassessmentandimprovementofwheatgenomesequenceassembliesusingfosilljumpinglibraries
AT bevanmichaelw independentassessmentandimprovementofwheatgenomesequenceassembliesusingfosilljumpinglibraries