Cargando…

A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome

For vertebrate organisms where a reference genome is not available, de novo transcriptome assembly enables a cost effective insight into the identification of tissue specific or differentially expressed genes and variation of the coding part of the genome. However, since there are a number of differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Moreton, Joanna, Dunham, Stephen P., Emes, Richard D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4070175/
https://www.ncbi.nlm.nih.gov/pubmed/25009556
http://dx.doi.org/10.3389/fgene.2014.00190
_version_ 1782322657822244864
author Moreton, Joanna
Dunham, Stephen P.
Emes, Richard D.
author_facet Moreton, Joanna
Dunham, Stephen P.
Emes, Richard D.
author_sort Moreton, Joanna
collection PubMed
description For vertebrate organisms where a reference genome is not available, de novo transcriptome assembly enables a cost effective insight into the identification of tissue specific or differentially expressed genes and variation of the coding part of the genome. However, since there are a number of different tools and parameters that can be used to reconstruct transcripts, it is difficult to determine an optimal method. Here we suggest a pipeline based on (1) assessing the performance of three different assembly tools (2) using both single and multiple k-mer (MK) approaches (3) examining the influence of the number of reads used in the assembly (4) merging assemblies from different tools. We use an example dataset from the vertebrate Anas platyrhynchos domestica (Pekin duck). We find that taking a subset of data enables a robust assembly to be produced by multiple methods without the need for very high memory capacity. The use of reads mapped back to transcripts (RMBT) and CEGMA (Core Eukaryotic Genes Mapping Approach) provides useful metrics to determine the completeness of assembly obtained. For this dataset the use of MK in the assembly generated a more complete assembly as measured by greater number of RMBT and CEGMA score. Merged single k-mer assemblies are generally smaller but consist of longer transcripts, suggesting an assembly consisting of fewer fragmented transcripts. We suggest that the use of a subset of reads during assembly allows the relatively rapid investigation of assembly characteristics and can guide the user to the most appropriate transcriptome for particular downstream use. Transcriptomes generated by the compared assembly methods and the final merged assembly are freely available for download at http://dx.doi.org/10.6084/m9.figshare.1032613.
format Online
Article
Text
id pubmed-4070175
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-40701752014-07-09 A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome Moreton, Joanna Dunham, Stephen P. Emes, Richard D. Front Genet Genetics For vertebrate organisms where a reference genome is not available, de novo transcriptome assembly enables a cost effective insight into the identification of tissue specific or differentially expressed genes and variation of the coding part of the genome. However, since there are a number of different tools and parameters that can be used to reconstruct transcripts, it is difficult to determine an optimal method. Here we suggest a pipeline based on (1) assessing the performance of three different assembly tools (2) using both single and multiple k-mer (MK) approaches (3) examining the influence of the number of reads used in the assembly (4) merging assemblies from different tools. We use an example dataset from the vertebrate Anas platyrhynchos domestica (Pekin duck). We find that taking a subset of data enables a robust assembly to be produced by multiple methods without the need for very high memory capacity. The use of reads mapped back to transcripts (RMBT) and CEGMA (Core Eukaryotic Genes Mapping Approach) provides useful metrics to determine the completeness of assembly obtained. For this dataset the use of MK in the assembly generated a more complete assembly as measured by greater number of RMBT and CEGMA score. Merged single k-mer assemblies are generally smaller but consist of longer transcripts, suggesting an assembly consisting of fewer fragmented transcripts. We suggest that the use of a subset of reads during assembly allows the relatively rapid investigation of assembly characteristics and can guide the user to the most appropriate transcriptome for particular downstream use. Transcriptomes generated by the compared assembly methods and the final merged assembly are freely available for download at http://dx.doi.org/10.6084/m9.figshare.1032613. Frontiers Media S.A. 2014-06-25 /pmc/articles/PMC4070175/ /pubmed/25009556 http://dx.doi.org/10.3389/fgene.2014.00190 Text en Copyright © 2014 Moreton, Dunham and Emes. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Moreton, Joanna
Dunham, Stephen P.
Emes, Richard D.
A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
title A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
title_full A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
title_fullStr A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
title_full_unstemmed A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
title_short A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
title_sort consensus approach to vertebrate de novo transcriptome assembly from rna-seq data: assembly of the duck (anas platyrhynchos) transcriptome
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4070175/
https://www.ncbi.nlm.nih.gov/pubmed/25009556
http://dx.doi.org/10.3389/fgene.2014.00190
work_keys_str_mv AT moretonjoanna aconsensusapproachtovertebratedenovotranscriptomeassemblyfromrnaseqdataassemblyoftheduckanasplatyrhynchostranscriptome
AT dunhamstephenp aconsensusapproachtovertebratedenovotranscriptomeassemblyfromrnaseqdataassemblyoftheduckanasplatyrhynchostranscriptome
AT emesrichardd aconsensusapproachtovertebratedenovotranscriptomeassemblyfromrnaseqdataassemblyoftheduckanasplatyrhynchostranscriptome
AT moretonjoanna consensusapproachtovertebratedenovotranscriptomeassemblyfromrnaseqdataassemblyoftheduckanasplatyrhynchostranscriptome
AT dunhamstephenp consensusapproachtovertebratedenovotranscriptomeassemblyfromrnaseqdataassemblyoftheduckanasplatyrhynchostranscriptome
AT emesrichardd consensusapproachtovertebratedenovotranscriptomeassemblyfromrnaseqdataassemblyoftheduckanasplatyrhynchostranscriptome