Cargando…
HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
BACKGROUND: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions b...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975632/ https://www.ncbi.nlm.nih.gov/pubmed/21079731 http://dx.doi.org/10.1371/journal.pone.0013875 |
_version_ | 1782190950709198848 |
---|---|
author | Dimon, Michelle T. Sorber, Katherine DeRisi, Joseph L. |
author_facet | Dimon, Michelle T. Sorber, Katherine DeRisi, Joseph L. |
author_sort | Dimon, Michelle T. |
collection | PubMed |
description | BACKGROUND: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. METHODOLOGY/PRINCIPAL FINDINGS: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. CONCLUSIONS/SIGNIFICANCE: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3′ splice sites and 1.4% of 5′ splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer. |
format | Text |
id | pubmed-2975632 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29756322010-11-15 HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data Dimon, Michelle T. Sorber, Katherine DeRisi, Joseph L. PLoS One Research Article BACKGROUND: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. METHODOLOGY/PRINCIPAL FINDINGS: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. CONCLUSIONS/SIGNIFICANCE: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3′ splice sites and 1.4% of 5′ splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer. Public Library of Science 2010-11-08 /pmc/articles/PMC2975632/ /pubmed/21079731 http://dx.doi.org/10.1371/journal.pone.0013875 Text en Dimon et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Dimon, Michelle T. Sorber, Katherine DeRisi, Joseph L. HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data |
title | HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data |
title_full | HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data |
title_fullStr | HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data |
title_full_unstemmed | HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data |
title_short | HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data |
title_sort | hmmsplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in rna-seq data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975632/ https://www.ncbi.nlm.nih.gov/pubmed/21079731 http://dx.doi.org/10.1371/journal.pone.0013875 |
work_keys_str_mv | AT dimonmichellet hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata AT sorberkatherine hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata AT derisijosephl hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata |