Cargando…

Separating homeologs by phasing in the tetraploid wheat transcriptome

BACKGROUND: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of m...

Descripción completa

Detalles Bibliográficos
Autores principales: Krasileva, Ksenia V, Buffalo, Vince, Bailey, Paul, Pearce, Stephen, Ayling, Sarah, Tabbita, Facundo, Soria, Marcelo, Wang, Shichen, Akhunov, Eduard, Uauy, Cristobal, Dubcovsky, Jorge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053977/
https://www.ncbi.nlm.nih.gov/pubmed/23800085
http://dx.doi.org/10.1186/gb-2013-14-6-r66
_version_ 1782320480745684992
author Krasileva, Ksenia V
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
author_facet Krasileva, Ksenia V
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
author_sort Krasileva, Ksenia V
collection PubMed
description BACKGROUND: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. RESULTS: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. CONCLUSIONS: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
format Online
Article
Text
id pubmed-4053977
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40539772014-06-13 Separating homeologs by phasing in the tetraploid wheat transcriptome Krasileva, Ksenia V Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge Genome Biol Research BACKGROUND: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. RESULTS: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. CONCLUSIONS: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. BioMed Central 2013 2013-06-25 /pmc/articles/PMC4053977/ /pubmed/23800085 http://dx.doi.org/10.1186/gb-2013-14-6-r66 Text en Copyright © 2013 Krasileva et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Krasileva, Ksenia V
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
Separating homeologs by phasing in the tetraploid wheat transcriptome
title Separating homeologs by phasing in the tetraploid wheat transcriptome
title_full Separating homeologs by phasing in the tetraploid wheat transcriptome
title_fullStr Separating homeologs by phasing in the tetraploid wheat transcriptome
title_full_unstemmed Separating homeologs by phasing in the tetraploid wheat transcriptome
title_short Separating homeologs by phasing in the tetraploid wheat transcriptome
title_sort separating homeologs by phasing in the tetraploid wheat transcriptome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053977/
https://www.ncbi.nlm.nih.gov/pubmed/23800085
http://dx.doi.org/10.1186/gb-2013-14-6-r66
work_keys_str_mv AT krasilevakseniav separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT buffalovince separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT baileypaul separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT pearcestephen separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT aylingsarah separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT tabbitafacundo separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT soriamarcelo separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT wangshichen separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT akhunoveduard separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT uauycristobal separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT dubcovskyjorge separatinghomeologsbyphasinginthetetraploidwheattranscriptome