Cargando…

FRAMA: from RNA-seq data to annotated mRNA assemblies

BACKGROUND: Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paral...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bens, Martin, Sahm, Arne, Groth, Marco, Jahn, Niels, Morhart, Michaela, Holtze, Susanne, Hildebrandt, Thomas B., Platzer, Matthias, Szafranski, Karol
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712544/ https://www.ncbi.nlm.nih.gov/pubmed/26763976 http://dx.doi.org/10.1186/s12864-015-2349-8

_version_	1782410084059447296
author	Bens, Martin Sahm, Arne Groth, Marco Jahn, Niels Morhart, Michaela Holtze, Susanne Hildebrandt, Thomas B. Platzer, Matthias Szafranski, Karol
author_facet	Bens, Martin Sahm, Arne Groth, Marco Jahn, Niels Morhart, Michaela Holtze, Susanne Hildebrandt, Thomas B. Platzer, Matthias Szafranski, Karol
author_sort	Bens, Martin
collection	PubMed
description	BACKGROUND: Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification. RESULTS: We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations. Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA’s gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches. CONCLUSION: FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2349-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4712544
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47125442016-01-15 FRAMA: from RNA-seq data to annotated mRNA assemblies Bens, Martin Sahm, Arne Groth, Marco Jahn, Niels Morhart, Michaela Holtze, Susanne Hildebrandt, Thomas B. Platzer, Matthias Szafranski, Karol BMC Genomics Software BACKGROUND: Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification. RESULTS: We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations. Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA’s gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches. CONCLUSION: FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2349-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-14 /pmc/articles/PMC4712544/ /pubmed/26763976 http://dx.doi.org/10.1186/s12864-015-2349-8 Text en © Bens et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Bens, Martin Sahm, Arne Groth, Marco Jahn, Niels Morhart, Michaela Holtze, Susanne Hildebrandt, Thomas B. Platzer, Matthias Szafranski, Karol FRAMA: from RNA-seq data to annotated mRNA assemblies
title	FRAMA: from RNA-seq data to annotated mRNA assemblies
title_full	FRAMA: from RNA-seq data to annotated mRNA assemblies
title_fullStr	FRAMA: from RNA-seq data to annotated mRNA assemblies
title_full_unstemmed	FRAMA: from RNA-seq data to annotated mRNA assemblies
title_short	FRAMA: from RNA-seq data to annotated mRNA assemblies
title_sort	frama: from rna-seq data to annotated mrna assemblies
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712544/ https://www.ncbi.nlm.nih.gov/pubmed/26763976 http://dx.doi.org/10.1186/s12864-015-2349-8
work_keys_str_mv	AT bensmartin framafromrnaseqdatatoannotatedmrnaassemblies AT sahmarne framafromrnaseqdatatoannotatedmrnaassemblies AT grothmarco framafromrnaseqdatatoannotatedmrnaassemblies AT jahnniels framafromrnaseqdatatoannotatedmrnaassemblies AT morhartmichaela framafromrnaseqdatatoannotatedmrnaassemblies AT holtzesusanne framafromrnaseqdatatoannotatedmrnaassemblies AT hildebrandtthomasb framafromrnaseqdatatoannotatedmrnaassemblies AT platzermatthias framafromrnaseqdatatoannotatedmrnaassemblies AT szafranskikarol framafromrnaseqdatatoannotatedmrnaassemblies

FRAMA: from RNA-seq data to annotated mRNA assemblies

Ejemplares similares