Cargando…
CNNSplice: Robust models for splice site prediction using convolutional neural networks
The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryot...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10250157/ https://www.ncbi.nlm.nih.gov/pubmed/37304005 http://dx.doi.org/10.1016/j.csbj.2023.05.031 |
_version_ | 1785055694128414720 |
---|---|
author | Akpokiro, Victor Chowdhury, H. M. A. Mohit Olowofila, Samuel Nusrat, Raisa Oluwadare, Oluwatosin |
author_facet | Akpokiro, Victor Chowdhury, H. M. A. Mohit Olowofila, Samuel Nusrat, Raisa Oluwadare, Oluwatosin |
author_sort | Akpokiro, Victor |
collection | PubMed |
description | The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryotic organisms through protein production and gene expression. Splice site detection tools have been proposed for this purpose; however, the models of these tools have a specific use case and are inefficiently or typically untransferable between organisms. Here, we present CNNSplice, a set of deep convolutional neural network models for splice site prediction. Using the five-fold cross-validation model selection technique, we explore several models based on typical machine learning applications and propose five high-performing models to efficiently predict the true and false SS in balanced and imbalanced datasets. Our evaluation results indicate that CNNSplice’s models achieve a better performance compared with existing methods across five organisms’ datasets. In addition, our generality test shows CNNSplice’s model ability to predict and annotate splice sites in new or poorly trained genome datasets indicating a broad application spectrum. CNNSplice demonstrates improved model prediction, interpretability, and generalizability on genomic datasets compared to existing splice site prediction tools. We have developed a web server for the CNNSplice algorithm which can be publicly accessed here: http://www.cnnsplice.online |
format | Online Article Text |
id | pubmed-10250157 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-102501572023-06-10 CNNSplice: Robust models for splice site prediction using convolutional neural networks Akpokiro, Victor Chowdhury, H. M. A. Mohit Olowofila, Samuel Nusrat, Raisa Oluwadare, Oluwatosin Comput Struct Biotechnol J Method Article The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryotic organisms through protein production and gene expression. Splice site detection tools have been proposed for this purpose; however, the models of these tools have a specific use case and are inefficiently or typically untransferable between organisms. Here, we present CNNSplice, a set of deep convolutional neural network models for splice site prediction. Using the five-fold cross-validation model selection technique, we explore several models based on typical machine learning applications and propose five high-performing models to efficiently predict the true and false SS in balanced and imbalanced datasets. Our evaluation results indicate that CNNSplice’s models achieve a better performance compared with existing methods across five organisms’ datasets. In addition, our generality test shows CNNSplice’s model ability to predict and annotate splice sites in new or poorly trained genome datasets indicating a broad application spectrum. CNNSplice demonstrates improved model prediction, interpretability, and generalizability on genomic datasets compared to existing splice site prediction tools. We have developed a web server for the CNNSplice algorithm which can be publicly accessed here: http://www.cnnsplice.online Research Network of Computational and Structural Biotechnology 2023-05-30 /pmc/articles/PMC10250157/ /pubmed/37304005 http://dx.doi.org/10.1016/j.csbj.2023.05.031 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Method Article Akpokiro, Victor Chowdhury, H. M. A. Mohit Olowofila, Samuel Nusrat, Raisa Oluwadare, Oluwatosin CNNSplice: Robust models for splice site prediction using convolutional neural networks |
title | CNNSplice: Robust models for splice site prediction using convolutional neural networks |
title_full | CNNSplice: Robust models for splice site prediction using convolutional neural networks |
title_fullStr | CNNSplice: Robust models for splice site prediction using convolutional neural networks |
title_full_unstemmed | CNNSplice: Robust models for splice site prediction using convolutional neural networks |
title_short | CNNSplice: Robust models for splice site prediction using convolutional neural networks |
title_sort | cnnsplice: robust models for splice site prediction using convolutional neural networks |
topic | Method Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10250157/ https://www.ncbi.nlm.nih.gov/pubmed/37304005 http://dx.doi.org/10.1016/j.csbj.2023.05.031 |
work_keys_str_mv | AT akpokirovictor cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks AT chowdhuryhmamohit cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks AT olowofilasamuel cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks AT nusratraisa cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks AT oluwadareoluwatosin cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks |