Cargando…
Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis
Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390528/ https://www.ncbi.nlm.nih.gov/pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6 |
_version_ | 1785082495608291328 |
---|---|
author | Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid |
author_facet | Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid |
author_sort | Ahmadi, Hosein |
collection | PubMed |
description | Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis. |
format | Online Article Text |
id | pubmed-10390528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-103905282023-08-02 Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid Sci Rep Article Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis. Nature Publishing Group UK 2023-07-31 /pmc/articles/PMC10390528/ /pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ahmadi, Hosein Sheikh-Assadi, Morteza Fatahi, Reza Zamani, Zabihollah Shokrpour, Majid Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis |
title | Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis |
title_full | Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis |
title_fullStr | Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis |
title_full_unstemmed | Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis |
title_short | Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis |
title_sort | optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of thymus daenensis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390528/ https://www.ncbi.nlm.nih.gov/pubmed/37524806 http://dx.doi.org/10.1038/s41598-023-39620-6 |
work_keys_str_mv | AT ahmadihosein optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT sheikhassadimorteza optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT fatahireza optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT zamanizabihollah optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis AT shokrpourmajid optimizinganefficientensembleapproachforhighqualitydenovotranscriptomeassemblyofthymusdaenensis |