Cargando…

Predicting the effect of variants on splicing using Convolutional Neural Networks

Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform...

Descripción completa

Detalles Bibliográficos
Autores principales: Thanapattheerakul, Thanyathorn, Engchuan, Worrawat, Chan, Jonathan H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346860/
https://www.ncbi.nlm.nih.gov/pubmed/32704450
http://dx.doi.org/10.7717/peerj.9470
_version_ 1783556480472973312
author Thanapattheerakul, Thanyathorn
Engchuan, Worrawat
Chan, Jonathan H.
author_facet Thanapattheerakul, Thanyathorn
Engchuan, Worrawat
Chan, Jonathan H.
author_sort Thanapattheerakul, Thanyathorn
collection PubMed
description Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10(−7)). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition.
format Online
Article
Text
id pubmed-7346860
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-73468602020-07-22 Predicting the effect of variants on splicing using Convolutional Neural Networks Thanapattheerakul, Thanyathorn Engchuan, Worrawat Chan, Jonathan H. PeerJ Bioinformatics Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10(−7)). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition. PeerJ Inc. 2020-07-06 /pmc/articles/PMC7346860/ /pubmed/32704450 http://dx.doi.org/10.7717/peerj.9470 Text en ©2020 Thanapattheerakul et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Thanapattheerakul, Thanyathorn
Engchuan, Worrawat
Chan, Jonathan H.
Predicting the effect of variants on splicing using Convolutional Neural Networks
title Predicting the effect of variants on splicing using Convolutional Neural Networks
title_full Predicting the effect of variants on splicing using Convolutional Neural Networks
title_fullStr Predicting the effect of variants on splicing using Convolutional Neural Networks
title_full_unstemmed Predicting the effect of variants on splicing using Convolutional Neural Networks
title_short Predicting the effect of variants on splicing using Convolutional Neural Networks
title_sort predicting the effect of variants on splicing using convolutional neural networks
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346860/
https://www.ncbi.nlm.nih.gov/pubmed/32704450
http://dx.doi.org/10.7717/peerj.9470
work_keys_str_mv AT thanapattheerakulthanyathorn predictingtheeffectofvariantsonsplicingusingconvolutionalneuralnetworks
AT engchuanworrawat predictingtheeffectofvariantsonsplicingusingconvolutionalneuralnetworks
AT chanjonathanh predictingtheeffectofvariantsonsplicingusingconvolutionalneuralnetworks