Cargando…
Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these al...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226252/ https://www.ncbi.nlm.nih.gov/pubmed/21988959 http://dx.doi.org/10.1186/1471-2105-12-S5-S2 |
_version_ | 1782217586687082496 |
---|---|
author | Lou, Shao-Ke Li, Jing-Woei Qin, Hao Yim, Aldrin Kay-Yuen Lo, Leung-Yau Ni, Bing Leung, Kwong-Sak Tsui, Stephen Kwok-Wing Chan, Ting-Fung |
author_facet | Lou, Shao-Ke Li, Jing-Woei Qin, Hao Yim, Aldrin Kay-Yuen Lo, Leung-Yau Ni, Bing Leung, Kwong-Sak Tsui, Stephen Kwok-Wing Chan, Ting-Fung |
author_sort | Lou, Shao-Ke |
collection | PubMed |
description | BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths. RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads. CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads. |
format | Online Article Text |
id | pubmed-3226252 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32262522011-11-30 Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length Lou, Shao-Ke Li, Jing-Woei Qin, Hao Yim, Aldrin Kay-Yuen Lo, Leung-Yau Ni, Bing Leung, Kwong-Sak Tsui, Stephen Kwok-Wing Chan, Ting-Fung BMC Bioinformatics Proceedings BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths. RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads. CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads. BioMed Central 2011-07-27 /pmc/articles/PMC3226252/ /pubmed/21988959 http://dx.doi.org/10.1186/1471-2105-12-S5-S2 Text en Copyright ©2011 Lou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Lou, Shao-Ke Li, Jing-Woei Qin, Hao Yim, Aldrin Kay-Yuen Lo, Leung-Yau Ni, Bing Leung, Kwong-Sak Tsui, Stephen Kwok-Wing Chan, Ting-Fung Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length |
title | Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length |
title_full | Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length |
title_fullStr | Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length |
title_full_unstemmed | Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length |
title_short | Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length |
title_sort | detection of splicing events and multiread locations from rna-seq data based on a geometric-tail (gt) distribution of intron length |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226252/ https://www.ncbi.nlm.nih.gov/pubmed/21988959 http://dx.doi.org/10.1186/1471-2105-12-S5-S2 |
work_keys_str_mv | AT loushaoke detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT lijingwoei detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT qinhao detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT yimaldrinkayyuen detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT loleungyau detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT nibing detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT leungkwongsak detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT tsuistephenkwokwing detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength AT chantingfung detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength |