Cargando…

The impact of read length on quantification of differentially expressed genes and splice junction detection

BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, pe...

Descripción completa

Detalles Bibliográficos
Autores principales: Chhangawala, Sagar, Rudy, Gabe, Mason, Christopher E., Rosenfeld, Jeffrey A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/
https://www.ncbi.nlm.nih.gov/pubmed/26100517
http://dx.doi.org/10.1186/s13059-015-0697-y
_version_ 1782385119715131392
author Chhangawala, Sagar
Rudy, Gabe
Mason, Christopher E.
Rosenfeld, Jeffrey A.
author_facet Chhangawala, Sagar
Rudy, Gabe
Mason, Christopher E.
Rosenfeld, Jeffrey A.
author_sort Chhangawala, Sagar
collection PubMed
description BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. RESULTS: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. CONCLUSIONS: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4531809
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45318092015-08-12 The impact of read length on quantification of differentially expressed genes and splice junction detection Chhangawala, Sagar Rudy, Gabe Mason, Christopher E. Rosenfeld, Jeffrey A. Genome Biol Research BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. RESULTS: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. CONCLUSIONS: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-06-23 2015 /pmc/articles/PMC4531809/ /pubmed/26100517 http://dx.doi.org/10.1186/s13059-015-0697-y Text en © Chhangawala et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chhangawala, Sagar
Rudy, Gabe
Mason, Christopher E.
Rosenfeld, Jeffrey A.
The impact of read length on quantification of differentially expressed genes and splice junction detection
title The impact of read length on quantification of differentially expressed genes and splice junction detection
title_full The impact of read length on quantification of differentially expressed genes and splice junction detection
title_fullStr The impact of read length on quantification of differentially expressed genes and splice junction detection
title_full_unstemmed The impact of read length on quantification of differentially expressed genes and splice junction detection
title_short The impact of read length on quantification of differentially expressed genes and splice junction detection
title_sort impact of read length on quantification of differentially expressed genes and splice junction detection
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/
https://www.ncbi.nlm.nih.gov/pubmed/26100517
http://dx.doi.org/10.1186/s13059-015-0697-y
work_keys_str_mv AT chhangawalasagar theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT rudygabe theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT masonchristophere theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT rosenfeldjeffreya theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT chhangawalasagar impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT rudygabe impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT masonchristophere impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection
AT rosenfeldjeffreya impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection