Cargando…

COSSMO: predicting competitive alternative splice site selection using deep learning

MOTIVATION: Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competit...

Descripción completa

Detalles Bibliográficos
Autores principales: Bretschneider, Hannes, Gandhi, Shreshth, Deshwar, Amit G, Zuberi, Khalid, Frey, Brendan J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022534/
https://www.ncbi.nlm.nih.gov/pubmed/29949959
http://dx.doi.org/10.1093/bioinformatics/bty244
_version_ 1783335698944753664
author Bretschneider, Hannes
Gandhi, Shreshth
Deshwar, Amit G
Zuberi, Khalid
Frey, Brendan J
author_facet Bretschneider, Hannes
Gandhi, Shreshth
Deshwar, Amit G
Zuberi, Khalid
Frey, Brendan J
author_sort Bretschneider, Hannes
collection PubMed
description MOTIVATION: Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3′ acceptor site conditional on a fixed upstream 5′ donor site or the choice of a 5′ donor site conditional on a fixed 3′ acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model. RESULTS: COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R(2) of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity. AVAILABILITY AND IMPLEMENTATION: Model predictions, our training dataset, and code are available from http://cossmo.genes.toronto.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022534
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225342018-07-10 COSSMO: predicting competitive alternative splice site selection using deep learning Bretschneider, Hannes Gandhi, Shreshth Deshwar, Amit G Zuberi, Khalid Frey, Brendan J Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3′ acceptor site conditional on a fixed upstream 5′ donor site or the choice of a 5′ donor site conditional on a fixed 3′ acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model. RESULTS: COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R(2) of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity. AVAILABILITY AND IMPLEMENTATION: Model predictions, our training dataset, and code are available from http://cossmo.genes.toronto.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022534/ /pubmed/29949959 http://dx.doi.org/10.1093/bioinformatics/bty244 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Bretschneider, Hannes
Gandhi, Shreshth
Deshwar, Amit G
Zuberi, Khalid
Frey, Brendan J
COSSMO: predicting competitive alternative splice site selection using deep learning
title COSSMO: predicting competitive alternative splice site selection using deep learning
title_full COSSMO: predicting competitive alternative splice site selection using deep learning
title_fullStr COSSMO: predicting competitive alternative splice site selection using deep learning
title_full_unstemmed COSSMO: predicting competitive alternative splice site selection using deep learning
title_short COSSMO: predicting competitive alternative splice site selection using deep learning
title_sort cossmo: predicting competitive alternative splice site selection using deep learning
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022534/
https://www.ncbi.nlm.nih.gov/pubmed/29949959
http://dx.doi.org/10.1093/bioinformatics/bty244
work_keys_str_mv AT bretschneiderhannes cossmopredictingcompetitivealternativesplicesiteselectionusingdeeplearning
AT gandhishreshth cossmopredictingcompetitivealternativesplicesiteselectionusingdeeplearning
AT deshwaramitg cossmopredictingcompetitivealternativesplicesiteselectionusingdeeplearning
AT zuberikhalid cossmopredictingcompetitivealternativesplicesiteselectionusingdeeplearning
AT freybrendanj cossmopredictingcompetitivealternativesplicesiteselectionusingdeeplearning