Cargando…

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study

Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, Zheng, Wang, Zhenjia, Li, Guojun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3988101/
https://www.ncbi.nlm.nih.gov/pubmed/24736633
http://dx.doi.org/10.1371/journal.pone.0094825
_version_ 1782311980903694336
author Chang, Zheng
Wang, Zhenjia
Li, Guojun
author_facet Chang, Zheng
Wang, Zhenjia
Li, Guojun
author_sort Chang, Zheng
collection PubMed
description Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of de novo assembly decreases sharply with the increase of transcriptome complexity, all existing de novo assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events.
format Online
Article
Text
id pubmed-3988101
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39881012014-04-21 The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study Chang, Zheng Wang, Zhenjia Li, Guojun PLoS One Research Article Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of de novo assembly decreases sharply with the increase of transcriptome complexity, all existing de novo assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events. Public Library of Science 2014-04-15 /pmc/articles/PMC3988101/ /pubmed/24736633 http://dx.doi.org/10.1371/journal.pone.0094825 Text en © 2014 Chang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chang, Zheng
Wang, Zhenjia
Li, Guojun
The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
title The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
title_full The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
title_fullStr The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
title_full_unstemmed The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
title_short The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
title_sort impacts of read length and transcriptome complexity for de novo assembly: a simulation study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3988101/
https://www.ncbi.nlm.nih.gov/pubmed/24736633
http://dx.doi.org/10.1371/journal.pone.0094825
work_keys_str_mv AT changzheng theimpactsofreadlengthandtranscriptomecomplexityfordenovoassemblyasimulationstudy
AT wangzhenjia theimpactsofreadlengthandtranscriptomecomplexityfordenovoassemblyasimulationstudy
AT liguojun theimpactsofreadlengthandtranscriptomecomplexityfordenovoassemblyasimulationstudy
AT changzheng impactsofreadlengthandtranscriptomecomplexityfordenovoassemblyasimulationstudy
AT wangzhenjia impactsofreadlengthandtranscriptomecomplexityfordenovoassemblyasimulationstudy
AT liguojun impactsofreadlengthandtranscriptomecomplexityfordenovoassemblyasimulationstudy