Cargando…

RNASequel: accurate and repeat tolerant realignment of RNA-seq reads

RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to acc...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilson, Gavin W., Stein, Lincoln D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605292/
https://www.ncbi.nlm.nih.gov/pubmed/26082497
http://dx.doi.org/10.1093/nar/gkv594
_version_ 1782395186648711168
author Wilson, Gavin W.
Stein, Lincoln D.
author_facet Wilson, Gavin W.
Stein, Lincoln D.
author_sort Wilson, Gavin W.
collection PubMed
description RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing. We have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. We demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. We then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing.
format Online
Article
Text
id pubmed-4605292
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46052922015-10-19 RNASequel: accurate and repeat tolerant realignment of RNA-seq reads Wilson, Gavin W. Stein, Lincoln D. Nucleic Acids Res Methods Online RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing. We have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. We demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. We then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing. Oxford University Press 2015-10-15 2015-10-10 /pmc/articles/PMC4605292/ /pubmed/26082497 http://dx.doi.org/10.1093/nar/gkv594 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Wilson, Gavin W.
Stein, Lincoln D.
RNASequel: accurate and repeat tolerant realignment of RNA-seq reads
title RNASequel: accurate and repeat tolerant realignment of RNA-seq reads
title_full RNASequel: accurate and repeat tolerant realignment of RNA-seq reads
title_fullStr RNASequel: accurate and repeat tolerant realignment of RNA-seq reads
title_full_unstemmed RNASequel: accurate and repeat tolerant realignment of RNA-seq reads
title_short RNASequel: accurate and repeat tolerant realignment of RNA-seq reads
title_sort rnasequel: accurate and repeat tolerant realignment of rna-seq reads
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605292/
https://www.ncbi.nlm.nih.gov/pubmed/26082497
http://dx.doi.org/10.1093/nar/gkv594
work_keys_str_mv AT wilsongavinw rnasequelaccurateandrepeattolerantrealignmentofrnaseqreads
AT steinlincolnd rnasequelaccurateandrepeattolerantrealignmentofrnaseqreads