Cargando…

Splam: a deep-learning-based splice site predictor that improves spliced alignments

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relative...

Descripción completa

Detalles Bibliográficos
Autores principales: Chao, Kuan-Hao, Mao, Alan, Salzberg, Steven L, Pertea, Mihaela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10402160/
https://www.ncbi.nlm.nih.gov/pubmed/37546880
http://dx.doi.org/10.1101/2023.07.27.550754
_version_ 1785084811495342080
author Chao, Kuan-Hao
Mao, Alan
Salzberg, Steven L
Pertea, Mihaela
author_facet Chao, Kuan-Hao
Mao, Alan
Salzberg, Steven L
Pertea, Mihaela
author_sort Chao, Kuan-Hao
collection PubMed
description The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam’s accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant Arabidopsis thaliana. Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments.
format Online
Article
Text
id pubmed-10402160
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-104021602023-08-05 Splam: a deep-learning-based splice site predictor that improves spliced alignments Chao, Kuan-Hao Mao, Alan Salzberg, Steven L Pertea, Mihaela bioRxiv Article The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam’s accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant Arabidopsis thaliana. Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments. Cold Spring Harbor Laboratory 2023-07-29 /pmc/articles/PMC10402160/ /pubmed/37546880 http://dx.doi.org/10.1101/2023.07.27.550754 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Chao, Kuan-Hao
Mao, Alan
Salzberg, Steven L
Pertea, Mihaela
Splam: a deep-learning-based splice site predictor that improves spliced alignments
title Splam: a deep-learning-based splice site predictor that improves spliced alignments
title_full Splam: a deep-learning-based splice site predictor that improves spliced alignments
title_fullStr Splam: a deep-learning-based splice site predictor that improves spliced alignments
title_full_unstemmed Splam: a deep-learning-based splice site predictor that improves spliced alignments
title_short Splam: a deep-learning-based splice site predictor that improves spliced alignments
title_sort splam: a deep-learning-based splice site predictor that improves spliced alignments
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10402160/
https://www.ncbi.nlm.nih.gov/pubmed/37546880
http://dx.doi.org/10.1101/2023.07.27.550754
work_keys_str_mv AT chaokuanhao splamadeeplearningbasedsplicesitepredictorthatimprovessplicedalignments
AT maoalan splamadeeplearningbasedsplicesitepredictorthatimprovessplicedalignments
AT salzbergstevenl splamadeeplearningbasedsplicesitepredictorthatimprovessplicedalignments
AT perteamihaela splamadeeplearningbasedsplicesitepredictorthatimprovessplicedalignments