Cargando…

HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data

BACKGROUND: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions b...

Descripción completa

Detalles Bibliográficos
Autores principales: Dimon, Michelle T., Sorber, Katherine, DeRisi, Joseph L.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975632/
https://www.ncbi.nlm.nih.gov/pubmed/21079731
http://dx.doi.org/10.1371/journal.pone.0013875
_version_ 1782190950709198848
author Dimon, Michelle T.
Sorber, Katherine
DeRisi, Joseph L.
author_facet Dimon, Michelle T.
Sorber, Katherine
DeRisi, Joseph L.
author_sort Dimon, Michelle T.
collection PubMed
description BACKGROUND: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. METHODOLOGY/PRINCIPAL FINDINGS: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. CONCLUSIONS/SIGNIFICANCE: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3′ splice sites and 1.4% of 5′ splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.
format Text
id pubmed-2975632
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29756322010-11-15 HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data Dimon, Michelle T. Sorber, Katherine DeRisi, Joseph L. PLoS One Research Article BACKGROUND: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. METHODOLOGY/PRINCIPAL FINDINGS: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. CONCLUSIONS/SIGNIFICANCE: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3′ splice sites and 1.4% of 5′ splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer. Public Library of Science 2010-11-08 /pmc/articles/PMC2975632/ /pubmed/21079731 http://dx.doi.org/10.1371/journal.pone.0013875 Text en Dimon et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dimon, Michelle T.
Sorber, Katherine
DeRisi, Joseph L.
HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
title HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
title_full HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
title_fullStr HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
title_full_unstemmed HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
title_short HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
title_sort hmmsplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975632/
https://www.ncbi.nlm.nih.gov/pubmed/21079731
http://dx.doi.org/10.1371/journal.pone.0013875
work_keys_str_mv AT dimonmichellet hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata
AT sorberkatherine hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata
AT derisijosephl hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata