Cargando…

Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana

BACKGROUND: Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels....

Descripción completa

Detalles Bibliográficos
Autores principales: Nakasugi, Kenlee, Crowhurst, Ross, Bally, Julia, Waterhouse, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948916/
https://www.ncbi.nlm.nih.gov/pubmed/24614631
http://dx.doi.org/10.1371/journal.pone.0091776
_version_ 1782306855951794176
author Nakasugi, Kenlee
Crowhurst, Ross
Bally, Julia
Waterhouse, Peter
author_facet Nakasugi, Kenlee
Crowhurst, Ross
Bally, Julia
Waterhouse, Peter
author_sort Nakasugi, Kenlee
collection PubMed
description BACKGROUND: Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels. This can lead to unassembled, partially assembled or mis-assembled contigs. Due to the different properties of de novo assemblers, no one assembler with any one given parameter space can re-assemble all possible transcripts from a transcriptome. RESULTS: In an effort to maximise the diversity and completeness of de novo assembled transcripts, we utilised four de novo transcriptome assemblers, TransAbyss, Trinity, SOAPdenovo-Trans, and Oases, using a range of k-mer sizes and different input RNA-seq read counts. We complemented the parameter space biologically by using RNA from 10 plant tissues. We then combined the output of all assemblies into a large super-set of sequences. Using a method from the EvidentialGene pipeline, the combined assembly was reduced from 9.9 million de novo assembled transcripts to about 235,000 of which about 50,000 were classified as primary. Metrics such as average bit-scores, feature response curves and the ability to distinguish paralogous or homeologous transcripts, indicated that the EvidentialGene processed assembly was of high quality. Of 35 RNA silencing gene transcripts, 34 were identified as assembled to full length, whereas in a previous assembly using only one assembler, 9 of these were partially assembled. CONCLUSIONS: To achieve a high quality transcriptome, it is advantageous to implement and combine the output from as many different de novo assemblers as possible. We have in essence taking the ‘best’ output from each assembler while minimising sequence redundancy. We have also shown that simultaneous assessment of a variety of metrics, not just focused on contig length, is necessary to gauge the quality of assemblies.
format Online
Article
Text
id pubmed-3948916
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39489162014-03-13 Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana Nakasugi, Kenlee Crowhurst, Ross Bally, Julia Waterhouse, Peter PLoS One Research Article BACKGROUND: Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels. This can lead to unassembled, partially assembled or mis-assembled contigs. Due to the different properties of de novo assemblers, no one assembler with any one given parameter space can re-assemble all possible transcripts from a transcriptome. RESULTS: In an effort to maximise the diversity and completeness of de novo assembled transcripts, we utilised four de novo transcriptome assemblers, TransAbyss, Trinity, SOAPdenovo-Trans, and Oases, using a range of k-mer sizes and different input RNA-seq read counts. We complemented the parameter space biologically by using RNA from 10 plant tissues. We then combined the output of all assemblies into a large super-set of sequences. Using a method from the EvidentialGene pipeline, the combined assembly was reduced from 9.9 million de novo assembled transcripts to about 235,000 of which about 50,000 were classified as primary. Metrics such as average bit-scores, feature response curves and the ability to distinguish paralogous or homeologous transcripts, indicated that the EvidentialGene processed assembly was of high quality. Of 35 RNA silencing gene transcripts, 34 were identified as assembled to full length, whereas in a previous assembly using only one assembler, 9 of these were partially assembled. CONCLUSIONS: To achieve a high quality transcriptome, it is advantageous to implement and combine the output from as many different de novo assemblers as possible. We have in essence taking the ‘best’ output from each assembler while minimising sequence redundancy. We have also shown that simultaneous assessment of a variety of metrics, not just focused on contig length, is necessary to gauge the quality of assemblies. Public Library of Science 2014-03-10 /pmc/articles/PMC3948916/ /pubmed/24614631 http://dx.doi.org/10.1371/journal.pone.0091776 Text en © 2014 Nakasugi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Nakasugi, Kenlee
Crowhurst, Ross
Bally, Julia
Waterhouse, Peter
Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
title Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
title_full Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
title_fullStr Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
title_full_unstemmed Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
title_short Combining Transcriptome Assemblies from Multiple De Novo Assemblers in the Allo-Tetraploid Plant Nicotiana benthamiana
title_sort combining transcriptome assemblies from multiple de novo assemblers in the allo-tetraploid plant nicotiana benthamiana
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948916/
https://www.ncbi.nlm.nih.gov/pubmed/24614631
http://dx.doi.org/10.1371/journal.pone.0091776
work_keys_str_mv AT nakasugikenlee combiningtranscriptomeassembliesfrommultipledenovoassemblersintheallotetraploidplantnicotianabenthamiana
AT crowhurstross combiningtranscriptomeassembliesfrommultipledenovoassemblersintheallotetraploidplantnicotianabenthamiana
AT ballyjulia combiningtranscriptomeassembliesfrommultipledenovoassemblersintheallotetraploidplantnicotianabenthamiana
AT waterhousepeter combiningtranscriptomeassembliesfrommultipledenovoassemblersintheallotetraploidplantnicotianabenthamiana