Cargando…

Analysing complex Triticeae genomes – concepts and strategies

The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies...

Descripción completa

Detalles Bibliográficos
Autores principales: Spannagl, Manuel, Martis, Mihaela M, Pfeifer, Matthias, Nussbaumer, Thomas, Mayer, Klaus FX
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847682/
https://www.ncbi.nlm.nih.gov/pubmed/24011260
http://dx.doi.org/10.1186/1746-4811-9-35
_version_ 1782293643433869312
author Spannagl, Manuel
Martis, Mihaela M
Pfeifer, Matthias
Nussbaumer, Thomas
Mayer, Klaus FX
author_facet Spannagl, Manuel
Martis, Mihaela M
Pfeifer, Matthias
Nussbaumer, Thomas
Mayer, Klaus FX
author_sort Spannagl, Manuel
collection PubMed
description The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies. The genome of barley is estimated to be 5 Gb in size whereas the genome of bread wheat accounts for 17 Gb and harbours an allo-hexaploid genome. Direct assembly of the sequence reads and access to the gene content is hampered by the repeat content. As a consequence, novel strategies and data analysis concepts had to be developed to provide much-needed whole genome sequence surveys and access to the gene repertoires. Here we describe some analytical strategies that now enable structuring of massive NGS data generated and pave the way towards structured and ordered sequence data and gene order. Specifically we report on the GenomeZipper, a synteny driven approach to order and structure NGS survey sequences of grass genomes that lack a physical map. In addition, to access and analyse the gene repertoire of allo-hexaploid bread wheat from the raw sequence reads, a reference-guided approach was developed utilizing representative genes from rice, Brachypodium distachyon, sorghum and barley. Stringent sub-assembly on the reference genes prevented collapsing of homeologous wheat genes and allowed to estimate gene retention rate and determine gene family sizes. Genomic sequences from the wheat sub-genome progenitors enabled to discriminate a large number of sub-assemblies between the wheat A, B or D sub-genome using machine learning algorithms. Many of the concepts outlined here can readily be applied to other complex plant and non-plant genomes.
format Online
Article
Text
id pubmed-3847682
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38476822013-12-04 Analysing complex Triticeae genomes – concepts and strategies Spannagl, Manuel Martis, Mihaela M Pfeifer, Matthias Nussbaumer, Thomas Mayer, Klaus FX Plant Methods Review The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies. The genome of barley is estimated to be 5 Gb in size whereas the genome of bread wheat accounts for 17 Gb and harbours an allo-hexaploid genome. Direct assembly of the sequence reads and access to the gene content is hampered by the repeat content. As a consequence, novel strategies and data analysis concepts had to be developed to provide much-needed whole genome sequence surveys and access to the gene repertoires. Here we describe some analytical strategies that now enable structuring of massive NGS data generated and pave the way towards structured and ordered sequence data and gene order. Specifically we report on the GenomeZipper, a synteny driven approach to order and structure NGS survey sequences of grass genomes that lack a physical map. In addition, to access and analyse the gene repertoire of allo-hexaploid bread wheat from the raw sequence reads, a reference-guided approach was developed utilizing representative genes from rice, Brachypodium distachyon, sorghum and barley. Stringent sub-assembly on the reference genes prevented collapsing of homeologous wheat genes and allowed to estimate gene retention rate and determine gene family sizes. Genomic sequences from the wheat sub-genome progenitors enabled to discriminate a large number of sub-assemblies between the wheat A, B or D sub-genome using machine learning algorithms. Many of the concepts outlined here can readily be applied to other complex plant and non-plant genomes. BioMed Central 2013-09-06 /pmc/articles/PMC3847682/ /pubmed/24011260 http://dx.doi.org/10.1186/1746-4811-9-35 Text en Copyright © 2013 Spannagl et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Spannagl, Manuel
Martis, Mihaela M
Pfeifer, Matthias
Nussbaumer, Thomas
Mayer, Klaus FX
Analysing complex Triticeae genomes – concepts and strategies
title Analysing complex Triticeae genomes – concepts and strategies
title_full Analysing complex Triticeae genomes – concepts and strategies
title_fullStr Analysing complex Triticeae genomes – concepts and strategies
title_full_unstemmed Analysing complex Triticeae genomes – concepts and strategies
title_short Analysing complex Triticeae genomes – concepts and strategies
title_sort analysing complex triticeae genomes – concepts and strategies
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847682/
https://www.ncbi.nlm.nih.gov/pubmed/24011260
http://dx.doi.org/10.1186/1746-4811-9-35
work_keys_str_mv AT spannaglmanuel analysingcomplextriticeaegenomesconceptsandstrategies
AT martismihaelam analysingcomplextriticeaegenomesconceptsandstrategies
AT pfeifermatthias analysingcomplextriticeaegenomesconceptsandstrategies
AT nussbaumerthomas analysingcomplextriticeaegenomesconceptsandstrategies
AT mayerklausfx analysingcomplextriticeaegenomesconceptsandstrategies