Cargando…

The impact of read length on quantification of differentially expressed genes and splice junction detection

BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, pe...

Descripción completa

Detalles Bibliográficos
Autores principales: Chhangawala, Sagar, Rudy, Gabe, Mason, Christopher E., Rosenfeld, Jeffrey A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/
https://www.ncbi.nlm.nih.gov/pubmed/26100517
http://dx.doi.org/10.1186/s13059-015-0697-y
Descripción
Sumario:BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. RESULTS: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. CONCLUSIONS: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users.