Cargando…

Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis

Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmadi, Hosein, Sheikh-Assadi, Morteza, Fatahi, Reza, Zamani, Zabihollah, Shokrpour, Majid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390528/
https://www.ncbi.nlm.nih.gov/pubmed/37524806
http://dx.doi.org/10.1038/s41598-023-39620-6
_version_ 1785082495608291328
author Ahmadi, Hosein
Sheikh-Assadi, Morteza
Fatahi, Reza
Zamani, Zabihollah
Shokrpour, Majid
author_facet Ahmadi, Hosein
Sheikh-Assadi, Morteza
Fatahi, Reza
Zamani, Zabihollah
Shokrpour, Majid
author_sort Ahmadi, Hosein
collection PubMed
description Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis.
format Online
Article
Text
id pubmed-10390528
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-103905282023-08-02 Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid Sci Rep Article Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis. Nature Publishing Group UK 2023-07-31 /pmc/articles/PMC10390528/ /pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ahmadi, Hosein
Sheikh-Assadi, Morteza
Fatahi, Reza
Zamani, Zabihollah
Shokrpour, Majid
Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_full Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_fullStr Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_full_unstemmed Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_short Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_sort optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of thymus daenensis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390528/
https://www.ncbi.nlm.nih.gov/pubmed/37524806
http://dx.doi.org/10.1038/s41598-023-39620-6
work_keys_str_mv AT ahmadihosein optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis
AT sheikhassadimorteza optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis
AT fatahireza optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis
AT zamanizabihollah optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis
AT shokrpourmajid optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis