Cargando…
The impact of read length on quantification of differentially expressed genes and splice junction detection
BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, pe...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/ https://www.ncbi.nlm.nih.gov/pubmed/26100517 http://dx.doi.org/10.1186/s13059-015-0697-y |
_version_ | 1782385119715131392 |
---|---|
author | Chhangawala, Sagar Rudy, Gabe Mason, Christopher E. Rosenfeld, Jeffrey A. |
author_facet | Chhangawala, Sagar Rudy, Gabe Mason, Christopher E. Rosenfeld, Jeffrey A. |
author_sort | Chhangawala, Sagar |
collection | PubMed |
description | BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. RESULTS: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. CONCLUSIONS: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4531809 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45318092015-08-12 The impact of read length on quantification of differentially expressed genes and splice junction detection Chhangawala, Sagar Rudy, Gabe Mason, Christopher E. Rosenfeld, Jeffrey A. Genome Biol Research BACKGROUND: The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. RESULTS: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. CONCLUSIONS: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0697-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-06-23 2015 /pmc/articles/PMC4531809/ /pubmed/26100517 http://dx.doi.org/10.1186/s13059-015-0697-y Text en © Chhangawala et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Chhangawala, Sagar Rudy, Gabe Mason, Christopher E. Rosenfeld, Jeffrey A. The impact of read length on quantification of differentially expressed genes and splice junction detection |
title | The impact of read length on quantification of differentially expressed genes and splice junction detection |
title_full | The impact of read length on quantification of differentially expressed genes and splice junction detection |
title_fullStr | The impact of read length on quantification of differentially expressed genes and splice junction detection |
title_full_unstemmed | The impact of read length on quantification of differentially expressed genes and splice junction detection |
title_short | The impact of read length on quantification of differentially expressed genes and splice junction detection |
title_sort | impact of read length on quantification of differentially expressed genes and splice junction detection |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531809/ https://www.ncbi.nlm.nih.gov/pubmed/26100517 http://dx.doi.org/10.1186/s13059-015-0697-y |
work_keys_str_mv | AT chhangawalasagar theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT rudygabe theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT masonchristophere theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT rosenfeldjeffreya theimpactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT chhangawalasagar impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT rudygabe impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT masonchristophere impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection AT rosenfeldjeffreya impactofreadlengthonquantificationofdifferentiallyexpressedgenesandsplicejunctiondetection |