Cargando…
Analysing complex Triticeae genomes – concepts and strategies
The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847682/ https://www.ncbi.nlm.nih.gov/pubmed/24011260 http://dx.doi.org/10.1186/1746-4811-9-35 |
_version_ | 1782293643433869312 |
---|---|
author | Spannagl, Manuel Martis, Mihaela M Pfeifer, Matthias Nussbaumer, Thomas Mayer, Klaus FX |
author_facet | Spannagl, Manuel Martis, Mihaela M Pfeifer, Matthias Nussbaumer, Thomas Mayer, Klaus FX |
author_sort | Spannagl, Manuel |
collection | PubMed |
description | The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies. The genome of barley is estimated to be 5 Gb in size whereas the genome of bread wheat accounts for 17 Gb and harbours an allo-hexaploid genome. Direct assembly of the sequence reads and access to the gene content is hampered by the repeat content. As a consequence, novel strategies and data analysis concepts had to be developed to provide much-needed whole genome sequence surveys and access to the gene repertoires. Here we describe some analytical strategies that now enable structuring of massive NGS data generated and pave the way towards structured and ordered sequence data and gene order. Specifically we report on the GenomeZipper, a synteny driven approach to order and structure NGS survey sequences of grass genomes that lack a physical map. In addition, to access and analyse the gene repertoire of allo-hexaploid bread wheat from the raw sequence reads, a reference-guided approach was developed utilizing representative genes from rice, Brachypodium distachyon, sorghum and barley. Stringent sub-assembly on the reference genes prevented collapsing of homeologous wheat genes and allowed to estimate gene retention rate and determine gene family sizes. Genomic sequences from the wheat sub-genome progenitors enabled to discriminate a large number of sub-assemblies between the wheat A, B or D sub-genome using machine learning algorithms. Many of the concepts outlined here can readily be applied to other complex plant and non-plant genomes. |
format | Online Article Text |
id | pubmed-3847682 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38476822013-12-04 Analysing complex Triticeae genomes – concepts and strategies Spannagl, Manuel Martis, Mihaela M Pfeifer, Matthias Nussbaumer, Thomas Mayer, Klaus FX Plant Methods Review The genomic sequences of many important Triticeae crop species are hard to assemble and analyse due to their large genome sizes, (in part) polyploid genomes and high repeat content. Recently, the draft genomes of barley and bread wheat were reported thanks to cost-efficient and fast NGS technologies. The genome of barley is estimated to be 5 Gb in size whereas the genome of bread wheat accounts for 17 Gb and harbours an allo-hexaploid genome. Direct assembly of the sequence reads and access to the gene content is hampered by the repeat content. As a consequence, novel strategies and data analysis concepts had to be developed to provide much-needed whole genome sequence surveys and access to the gene repertoires. Here we describe some analytical strategies that now enable structuring of massive NGS data generated and pave the way towards structured and ordered sequence data and gene order. Specifically we report on the GenomeZipper, a synteny driven approach to order and structure NGS survey sequences of grass genomes that lack a physical map. In addition, to access and analyse the gene repertoire of allo-hexaploid bread wheat from the raw sequence reads, a reference-guided approach was developed utilizing representative genes from rice, Brachypodium distachyon, sorghum and barley. Stringent sub-assembly on the reference genes prevented collapsing of homeologous wheat genes and allowed to estimate gene retention rate and determine gene family sizes. Genomic sequences from the wheat sub-genome progenitors enabled to discriminate a large number of sub-assemblies between the wheat A, B or D sub-genome using machine learning algorithms. Many of the concepts outlined here can readily be applied to other complex plant and non-plant genomes. BioMed Central 2013-09-06 /pmc/articles/PMC3847682/ /pubmed/24011260 http://dx.doi.org/10.1186/1746-4811-9-35 Text en Copyright © 2013 Spannagl et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Spannagl, Manuel Martis, Mihaela M Pfeifer, Matthias Nussbaumer, Thomas Mayer, Klaus FX Analysing complex Triticeae genomes – concepts and strategies |
title | Analysing complex Triticeae genomes – concepts and strategies |
title_full | Analysing complex Triticeae genomes – concepts and strategies |
title_fullStr | Analysing complex Triticeae genomes – concepts and strategies |
title_full_unstemmed | Analysing complex Triticeae genomes – concepts and strategies |
title_short | Analysing complex Triticeae genomes – concepts and strategies |
title_sort | analysing complex triticeae genomes – concepts and strategies |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847682/ https://www.ncbi.nlm.nih.gov/pubmed/24011260 http://dx.doi.org/10.1186/1746-4811-9-35 |
work_keys_str_mv | AT spannaglmanuel analysingcomplextriticeaegenomesconceptsandstrategies AT martismihaelam analysingcomplextriticeaegenomesconceptsandstrategies AT pfeifermatthias analysingcomplextriticeaegenomesconceptsandstrategies AT nussbaumerthomas analysingcomplextriticeaegenomesconceptsandstrategies AT mayerklausfx analysingcomplextriticeaegenomesconceptsandstrategies |