Cargando…
Predicting the effect of variants on splicing using Convolutional Neural Networks
Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346860/ https://www.ncbi.nlm.nih.gov/pubmed/32704450 http://dx.doi.org/10.7717/peerj.9470 |
_version_ | 1783556480472973312 |
---|---|
author | Thanapattheerakul, Thanyathorn Engchuan, Worrawat Chan, Jonathan H. |
author_facet | Thanapattheerakul, Thanyathorn Engchuan, Worrawat Chan, Jonathan H. |
author_sort | Thanapattheerakul, Thanyathorn |
collection | PubMed |
description | Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10(−7)). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition. |
format | Online Article Text |
id | pubmed-7346860 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73468602020-07-22 Predicting the effect of variants on splicing using Convolutional Neural Networks Thanapattheerakul, Thanyathorn Engchuan, Worrawat Chan, Jonathan H. PeerJ Bioinformatics Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10(−7)). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition. PeerJ Inc. 2020-07-06 /pmc/articles/PMC7346860/ /pubmed/32704450 http://dx.doi.org/10.7717/peerj.9470 Text en ©2020 Thanapattheerakul et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Thanapattheerakul, Thanyathorn Engchuan, Worrawat Chan, Jonathan H. Predicting the effect of variants on splicing using Convolutional Neural Networks |
title | Predicting the effect of variants on splicing using Convolutional Neural Networks |
title_full | Predicting the effect of variants on splicing using Convolutional Neural Networks |
title_fullStr | Predicting the effect of variants on splicing using Convolutional Neural Networks |
title_full_unstemmed | Predicting the effect of variants on splicing using Convolutional Neural Networks |
title_short | Predicting the effect of variants on splicing using Convolutional Neural Networks |
title_sort | predicting the effect of variants on splicing using convolutional neural networks |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346860/ https://www.ncbi.nlm.nih.gov/pubmed/32704450 http://dx.doi.org/10.7717/peerj.9470 |
work_keys_str_mv | AT thanapattheerakulthanyathorn predictingtheeffectofvariantsonsplicingusingconvolutionalneuralnetworks AT engchuanworrawat predictingtheeffectofvariantsonsplicingusingconvolutionalneuralnetworks AT chanjonathanh predictingtheeffectofvariantsonsplicingusingconvolutionalneuralnetworks |