Cargando…
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads
BACKGROUND: Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4615873/ https://www.ncbi.nlm.nih.gov/pubmed/26500767 http://dx.doi.org/10.1186/s13742-015-0089-y |
_version_ | 1782396525385613312 |
---|---|
author | Song, Li Florea, Liliana |
author_facet | Song, Li Florea, Liliana |
author_sort | Song, Li |
collection | PubMed |
description | BACKGROUND: Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. FINDINGS: We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. CONCLUSIONS: Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0089-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4615873 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46158732015-10-23 Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads Song, Li Florea, Liliana Gigascience Technical Note BACKGROUND: Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. FINDINGS: We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. CONCLUSIONS: Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0089-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-19 /pmc/articles/PMC4615873/ /pubmed/26500767 http://dx.doi.org/10.1186/s13742-015-0089-y Text en © Song and Florea. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Note Song, Li Florea, Liliana Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads |
title | Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads |
title_full | Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads |
title_fullStr | Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads |
title_full_unstemmed | Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads |
title_short | Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads |
title_sort | rcorrector: efficient and accurate error correction for illumina rna-seq reads |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4615873/ https://www.ncbi.nlm.nih.gov/pubmed/26500767 http://dx.doi.org/10.1186/s13742-015-0089-y |
work_keys_str_mv | AT songli rcorrectorefficientandaccurateerrorcorrectionforilluminarnaseqreads AT florealiliana rcorrectorefficientandaccurateerrorcorrectionforilluminarnaseqreads |