Cargando…

Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach

BACKGROUND: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprec...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yi, Liu, Xinan, MacLeod, James, Liu, Jinze
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307148/
https://www.ncbi.nlm.nih.gov/pubmed/30591034
http://dx.doi.org/10.1186/s12864-018-5350-1
_version_ 1783382939765047296
author Zhang, Yi
Liu, Xinan
MacLeod, James
Liu, Jinze
author_facet Zhang, Yi
Liu, Xinan
MacLeod, James
Liu, Jinze
author_sort Zhang, Yi
collection PubMed
description BACKGROUND: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation. RESULTS: In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions. CONCLUSIONS: A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5350-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6307148
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63071482019-01-02 Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach Zhang, Yi Liu, Xinan MacLeod, James Liu, Jinze BMC Genomics Research Article BACKGROUND: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation. RESULTS: In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions. CONCLUSIONS: A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5350-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-27 /pmc/articles/PMC6307148/ /pubmed/30591034 http://dx.doi.org/10.1186/s12864-018-5350-1 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhang, Yi
Liu, Xinan
MacLeod, James
Liu, Jinze
Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
title Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
title_full Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
title_fullStr Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
title_full_unstemmed Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
title_short Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach
title_sort discerning novel splice junctions derived from rna-seq alignment: a deep learning approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307148/
https://www.ncbi.nlm.nih.gov/pubmed/30591034
http://dx.doi.org/10.1186/s12864-018-5350-1
work_keys_str_mv AT zhangyi discerningnovelsplicejunctionsderivedfromrnaseqalignmentadeeplearningapproach
AT liuxinan discerningnovelsplicejunctionsderivedfromrnaseqalignmentadeeplearningapproach
AT macleodjames discerningnovelsplicejunctionsderivedfromrnaseqalignmentadeeplearningapproach
AT liujinze discerningnovelsplicejunctionsderivedfromrnaseqalignmentadeeplearningapproach