Cargando…

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

BACKGROUND: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genom...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Qiong-Yi, Wang, Yi, Kong, Yi-Meng, Luo, Da, Li, Xuan, Hao, Pei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287467/
https://www.ncbi.nlm.nih.gov/pubmed/22373417
http://dx.doi.org/10.1186/1471-2105-12-S14-S2
_version_ 1782224669776019456
author Zhao, Qiong-Yi
Wang, Yi
Kong, Yi-Meng
Luo, Da
Li, Xuan
Hao, Pei
author_facet Zhao, Qiong-Yi
Wang, Yi
Kong, Yi-Meng
Luo, Da
Li, Xuan
Hao, Pei
author_sort Zhao, Qiong-Yi
collection PubMed
description BACKGROUND: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data. RESULTS: To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies. CONCLUSIONS: Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.
format Online
Article
Text
id pubmed-3287467
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32874672012-02-28 Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study Zhao, Qiong-Yi Wang, Yi Kong, Yi-Meng Luo, Da Li, Xuan Hao, Pei BMC Bioinformatics Proceedings BACKGROUND: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data. RESULTS: To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies. CONCLUSIONS: Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods. BioMed Central 2011-12-14 /pmc/articles/PMC3287467/ /pubmed/22373417 http://dx.doi.org/10.1186/1471-2105-12-S14-S2 Text en Copyright ©2011 Zhao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Zhao, Qiong-Yi
Wang, Yi
Kong, Yi-Meng
Luo, Da
Li, Xuan
Hao, Pei
Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
title Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
title_full Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
title_fullStr Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
title_full_unstemmed Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
title_short Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
title_sort optimizing de novo transcriptome assembly from short-read rna-seq data: a comparative study
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287467/
https://www.ncbi.nlm.nih.gov/pubmed/22373417
http://dx.doi.org/10.1186/1471-2105-12-S14-S2
work_keys_str_mv AT zhaoqiongyi optimizingdenovotranscriptomeassemblyfromshortreadrnaseqdataacomparativestudy
AT wangyi optimizingdenovotranscriptomeassemblyfromshortreadrnaseqdataacomparativestudy
AT kongyimeng optimizingdenovotranscriptomeassemblyfromshortreadrnaseqdataacomparativestudy
AT luoda optimizingdenovotranscriptomeassemblyfromshortreadrnaseqdataacomparativestudy
AT lixuan optimizingdenovotranscriptomeassemblyfromshortreadrnaseqdataacomparativestudy
AT haopei optimizingdenovotranscriptomeassemblyfromshortreadrnaseqdataacomparativestudy