Cargando…

Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis

Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ahmadi, Hosein, Sheikh-Assadi, Morteza, Fatahi, Reza, Zamani, Zabihollah, Shokrpour, Majid
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390528/ https://www.ncbi.nlm.nih.gov/pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6

_version_	1785082495608291328
author	Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid
author_facet	Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid
author_sort	Ahmadi, Hosein
collection	PubMed
description	Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis.
format	Online Article Text
id	pubmed-10390528
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-103905282023-08-02 Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid Sci Rep Article Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis. Nature Publishing Group UK 2023-07-31 /pmc/articles/PMC10390528/ /pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title	Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_full	Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_fullStr	Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_full_unstemmed	Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_short	Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
title_sort	optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of thymus daenensis
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390528/ https://www.ncbi.nlm.nih.gov/pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6
work_keys_str_mv	AT ahmadihosein optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT sheikhassadimorteza optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT fatahireza optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT zamanizabihollah optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT shokrpourmajid optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis

Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis

Ejemplares similares