Cargando…

A consensus-based ensemble approach to improve transcriptome assembly

BACKGROUND: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality tran...

Descripción completa

Detalles Bibliográficos
Autores principales: Voshall, Adam, Behera, Sairam, Li, Xiangjun, Yu, Xiao-Hong, Kapil, Kushagra, Deogun, Jitender S., Shanklin, John, Cahoon, Edgar B., Moriyama, Etsuko N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8532302/
https://www.ncbi.nlm.nih.gov/pubmed/34674629
http://dx.doi.org/10.1186/s12859-021-04434-8
_version_ 1784587039655591936
author Voshall, Adam
Behera, Sairam
Li, Xiangjun
Yu, Xiao-Hong
Kapil, Kushagra
Deogun, Jitender S.
Shanklin, John
Cahoon, Edgar B.
Moriyama, Etsuko N.
author_facet Voshall, Adam
Behera, Sairam
Li, Xiangjun
Yu, Xiao-Hong
Kapil, Kushagra
Deogun, Jitender S.
Shanklin, John
Cahoon, Edgar B.
Moriyama, Etsuko N.
author_sort Voshall, Adam
collection PubMed
description BACKGROUND: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. RESULTS: In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. CONCLUSIONS: Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04434-8.
format Online
Article
Text
id pubmed-8532302
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85323022021-10-25 A consensus-based ensemble approach to improve transcriptome assembly Voshall, Adam Behera, Sairam Li, Xiangjun Yu, Xiao-Hong Kapil, Kushagra Deogun, Jitender S. Shanklin, John Cahoon, Edgar B. Moriyama, Etsuko N. BMC Bioinformatics Methodology Article BACKGROUND: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. RESULTS: In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. CONCLUSIONS: Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04434-8. BioMed Central 2021-10-21 /pmc/articles/PMC8532302/ /pubmed/34674629 http://dx.doi.org/10.1186/s12859-021-04434-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Voshall, Adam
Behera, Sairam
Li, Xiangjun
Yu, Xiao-Hong
Kapil, Kushagra
Deogun, Jitender S.
Shanklin, John
Cahoon, Edgar B.
Moriyama, Etsuko N.
A consensus-based ensemble approach to improve transcriptome assembly
title A consensus-based ensemble approach to improve transcriptome assembly
title_full A consensus-based ensemble approach to improve transcriptome assembly
title_fullStr A consensus-based ensemble approach to improve transcriptome assembly
title_full_unstemmed A consensus-based ensemble approach to improve transcriptome assembly
title_short A consensus-based ensemble approach to improve transcriptome assembly
title_sort consensus-based ensemble approach to improve transcriptome assembly
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8532302/
https://www.ncbi.nlm.nih.gov/pubmed/34674629
http://dx.doi.org/10.1186/s12859-021-04434-8
work_keys_str_mv AT voshalladam aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT beherasairam aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT lixiangjun aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT yuxiaohong aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT kapilkushagra aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT deogunjitenders aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT shanklinjohn aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT cahoonedgarb aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT moriyamaetsukon aconsensusbasedensembleapproachtoimprovetranscriptomeassembly
AT voshalladam consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT beherasairam consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT lixiangjun consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT yuxiaohong consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT kapilkushagra consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT deogunjitenders consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT shanklinjohn consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT cahoonedgarb consensusbasedensembleapproachtoimprovetranscriptomeassembly
AT moriyamaetsukon consensusbasedensembleapproachtoimprovetranscriptomeassembly