Cargando…

STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow

BACKGROUND: De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step...

Descripción completa

Detalles Bibliográficos
Autores principales: Saggese, Igor, Bona, Elisa, Conway, Max, Favero, Francesco, Ladetto, Marco, Liò, Pietro, Manzini, Giovanni, Mignone, Flavio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069750/
https://www.ncbi.nlm.nih.gov/pubmed/30066630
http://dx.doi.org/10.1186/s12859-018-2174-6
_version_ 1783343560230174720
author Saggese, Igor
Bona, Elisa
Conway, Max
Favero, Francesco
Ladetto, Marco
Liò, Pietro
Manzini, Giovanni
Mignone, Flavio
author_facet Saggese, Igor
Bona, Elisa
Conway, Max
Favero, Francesco
Ladetto, Marco
Liò, Pietro
Manzini, Giovanni
Mignone, Flavio
author_sort Saggese, Igor
collection PubMed
description BACKGROUND: De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis workflow is the assembly of reads to reconstruct transcripts thus reducing the complexity of the analysis. Despite many available tools show a good sensitivity, there is a high percentage of false positives due to the high number of assemblies considered and it is likely that the high frequency of false positive is underestimated by currently used benchmarks. The reconstruction of not existing transcripts may false the biological interpretation of results as – for example – may overestimate the identification of “novel” transcripts. Moreover, benchmarks performed are usually based on RNA-seq data from annotated genomes and assembled transcripts are compared to annotations and genomes to identify putative good and wrong reconstructions, but these tests alone may lead to accept a particular type of false positive as true, as better described below. RESULTS: Here we present a novel methodology of de novo assembly, implemented in a software named STAble (Short-reads Transcriptome Assembler). The novel concept of this assembler is that the whole reads are used to determine possible alignments instead of using smaller k-mers, with the aim of reducing the number of chimeras produced. Furthermore, we applied a new set of benchmarks based on simulated data to better define the performance of assembly method and carefully identifying true reconstructions. STAble was also used to build a prototype workflow to analyse metatranscriptomics data in connection to a steady state metabolic modelling algorithm. This algorithm was used to produce high quality metabolic interpretations of small gene expression sets obtained from already published RNA-seq data that we assembled with STAble. CONCLUSIONS: The presented results, albeit preliminary, clearly suggest that with this approach is possible to identify informative reactions not directly revealed by raw transcriptomic data.
format Online
Article
Text
id pubmed-6069750
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60697502018-08-03 STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow Saggese, Igor Bona, Elisa Conway, Max Favero, Francesco Ladetto, Marco Liò, Pietro Manzini, Giovanni Mignone, Flavio BMC Bioinformatics Research BACKGROUND: De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis workflow is the assembly of reads to reconstruct transcripts thus reducing the complexity of the analysis. Despite many available tools show a good sensitivity, there is a high percentage of false positives due to the high number of assemblies considered and it is likely that the high frequency of false positive is underestimated by currently used benchmarks. The reconstruction of not existing transcripts may false the biological interpretation of results as – for example – may overestimate the identification of “novel” transcripts. Moreover, benchmarks performed are usually based on RNA-seq data from annotated genomes and assembled transcripts are compared to annotations and genomes to identify putative good and wrong reconstructions, but these tests alone may lead to accept a particular type of false positive as true, as better described below. RESULTS: Here we present a novel methodology of de novo assembly, implemented in a software named STAble (Short-reads Transcriptome Assembler). The novel concept of this assembler is that the whole reads are used to determine possible alignments instead of using smaller k-mers, with the aim of reducing the number of chimeras produced. Furthermore, we applied a new set of benchmarks based on simulated data to better define the performance of assembly method and carefully identifying true reconstructions. STAble was also used to build a prototype workflow to analyse metatranscriptomics data in connection to a steady state metabolic modelling algorithm. This algorithm was used to produce high quality metabolic interpretations of small gene expression sets obtained from already published RNA-seq data that we assembled with STAble. CONCLUSIONS: The presented results, albeit preliminary, clearly suggest that with this approach is possible to identify informative reactions not directly revealed by raw transcriptomic data. BioMed Central 2018-07-09 /pmc/articles/PMC6069750/ /pubmed/30066630 http://dx.doi.org/10.1186/s12859-018-2174-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Saggese, Igor
Bona, Elisa
Conway, Max
Favero, Francesco
Ladetto, Marco
Liò, Pietro
Manzini, Giovanni
Mignone, Flavio
STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
title STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
title_full STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
title_fullStr STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
title_full_unstemmed STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
title_short STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
title_sort stable: a novel approach to de novo assembly of rna-seq data and its application in a metabolic model network based metatranscriptomic workflow
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069750/
https://www.ncbi.nlm.nih.gov/pubmed/30066630
http://dx.doi.org/10.1186/s12859-018-2174-6
work_keys_str_mv AT saggeseigor stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT bonaelisa stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT conwaymax stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT faverofrancesco stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT ladettomarco stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT liopietro stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT manzinigiovanni stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow
AT mignoneflavio stableanovelapproachtodenovoassemblyofrnaseqdataanditsapplicationinametabolicmodelnetworkbasedmetatranscriptomicworkflow