Cargando…

Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length

BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these al...

Descripción completa

Detalles Bibliográficos
Autores principales: Lou, Shao-Ke, Li, Jing-Woei, Qin, Hao, Yim, Aldrin Kay-Yuen, Lo, Leung-Yau, Ni, Bing, Leung, Kwong-Sak, Tsui, Stephen Kwok-Wing, Chan, Ting-Fung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226252/
https://www.ncbi.nlm.nih.gov/pubmed/21988959
http://dx.doi.org/10.1186/1471-2105-12-S5-S2
_version_ 1782217586687082496
author Lou, Shao-Ke
Li, Jing-Woei
Qin, Hao
Yim, Aldrin Kay-Yuen
Lo, Leung-Yau
Ni, Bing
Leung, Kwong-Sak
Tsui, Stephen Kwok-Wing
Chan, Ting-Fung
author_facet Lou, Shao-Ke
Li, Jing-Woei
Qin, Hao
Yim, Aldrin Kay-Yuen
Lo, Leung-Yau
Ni, Bing
Leung, Kwong-Sak
Tsui, Stephen Kwok-Wing
Chan, Ting-Fung
author_sort Lou, Shao-Ke
collection PubMed
description BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths. RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads. CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads.
format Online
Article
Text
id pubmed-3226252
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32262522011-11-30 Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length Lou, Shao-Ke Li, Jing-Woei Qin, Hao Yim, Aldrin Kay-Yuen Lo, Leung-Yau Ni, Bing Leung, Kwong-Sak Tsui, Stephen Kwok-Wing Chan, Ting-Fung BMC Bioinformatics Proceedings BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths. RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads. CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads. BioMed Central 2011-07-27 /pmc/articles/PMC3226252/ /pubmed/21988959 http://dx.doi.org/10.1186/1471-2105-12-S5-S2 Text en Copyright ©2011 Lou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lou, Shao-Ke
Li, Jing-Woei
Qin, Hao
Yim, Aldrin Kay-Yuen
Lo, Leung-Yau
Ni, Bing
Leung, Kwong-Sak
Tsui, Stephen Kwok-Wing
Chan, Ting-Fung
Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
title Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
title_full Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
title_fullStr Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
title_full_unstemmed Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
title_short Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length
title_sort detection of splicing events and multiread locations from rna-seq data based on a geometric-tail (gt) distribution of intron length
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226252/
https://www.ncbi.nlm.nih.gov/pubmed/21988959
http://dx.doi.org/10.1186/1471-2105-12-S5-S2
work_keys_str_mv AT loushaoke detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT lijingwoei detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT qinhao detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT yimaldrinkayyuen detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT loleungyau detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT nibing detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT leungkwongsak detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT tsuistephenkwokwing detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength
AT chantingfung detectionofsplicingeventsandmultireadlocationsfromrnaseqdatabasedonageometrictailgtdistributionofintronlength