Cargando…

TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms

BACKGROUND: The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcript...

Descripción completa

Detalles Bibliográficos
Autores principales: Seoane, Pedro, Espigares, Marina, Carmona, Rosario, Polonio, Álvaro, Quintana, Julia, Cretazzo, Enrico, Bota, Josefina, Pérez-García, Alejandro, Dios Alché, Juan de, Gómez, Luis, Claros, M. Gonzalo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245506/
https://www.ncbi.nlm.nih.gov/pubmed/30453874
http://dx.doi.org/10.1186/s12859-018-2384-y
_version_ 1783372256594886656
author Seoane, Pedro
Espigares, Marina
Carmona, Rosario
Polonio, Álvaro
Quintana, Julia
Cretazzo, Enrico
Bota, Josefina
Pérez-García, Alejandro
Dios Alché, Juan de
Gómez, Luis
Claros, M. Gonzalo
author_facet Seoane, Pedro
Espigares, Marina
Carmona, Rosario
Polonio, Álvaro
Quintana, Julia
Cretazzo, Enrico
Bota, Josefina
Pérez-García, Alejandro
Dios Alché, Juan de
Gómez, Luis
Claros, M. Gonzalo
author_sort Seoane, Pedro
collection PubMed
description BACKGROUND: The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. RESULTS: TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. CONCLUSIONS: TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2384-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6245506
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62455062018-11-26 TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms Seoane, Pedro Espigares, Marina Carmona, Rosario Polonio, Álvaro Quintana, Julia Cretazzo, Enrico Bota, Josefina Pérez-García, Alejandro Dios Alché, Juan de Gómez, Luis Claros, M. Gonzalo BMC Bioinformatics Research BACKGROUND: The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. RESULTS: TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. CONCLUSIONS: TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2384-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-20 /pmc/articles/PMC6245506/ /pubmed/30453874 http://dx.doi.org/10.1186/s12859-018-2384-y Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Seoane, Pedro
Espigares, Marina
Carmona, Rosario
Polonio, Álvaro
Quintana, Julia
Cretazzo, Enrico
Bota, Josefina
Pérez-García, Alejandro
Dios Alché, Juan de
Gómez, Luis
Claros, M. Gonzalo
TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
title TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
title_full TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
title_fullStr TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
title_full_unstemmed TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
title_short TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
title_sort transflow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245506/
https://www.ncbi.nlm.nih.gov/pubmed/30453874
http://dx.doi.org/10.1186/s12859-018-2384-y
work_keys_str_mv AT seoanepedro transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT espigaresmarina transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT carmonarosario transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT polonioalvaro transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT quintanajulia transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT cretazzoenrico transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT botajosefina transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT perezgarciaalejandro transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT diosalchejuande transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT gomezluis transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms
AT clarosmgonzalo transflowamodularframeworkforassemblingandassessingaccuratedenovotranscriptomesinnonmodelorganisms