Cargando…

CNNSplice: Robust models for splice site prediction using convolutional neural networks

The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryot...

Descripción completa

Detalles Bibliográficos
Autores principales: Akpokiro, Victor, Chowdhury, H. M. A. Mohit, Olowofila, Samuel, Nusrat, Raisa, Oluwadare, Oluwatosin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10250157/
https://www.ncbi.nlm.nih.gov/pubmed/37304005
http://dx.doi.org/10.1016/j.csbj.2023.05.031
_version_ 1785055694128414720
author Akpokiro, Victor
Chowdhury, H. M. A. Mohit
Olowofila, Samuel
Nusrat, Raisa
Oluwadare, Oluwatosin
author_facet Akpokiro, Victor
Chowdhury, H. M. A. Mohit
Olowofila, Samuel
Nusrat, Raisa
Oluwadare, Oluwatosin
author_sort Akpokiro, Victor
collection PubMed
description The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryotic organisms through protein production and gene expression. Splice site detection tools have been proposed for this purpose; however, the models of these tools have a specific use case and are inefficiently or typically untransferable between organisms. Here, we present CNNSplice, a set of deep convolutional neural network models for splice site prediction. Using the five-fold cross-validation model selection technique, we explore several models based on typical machine learning applications and propose five high-performing models to efficiently predict the true and false SS in balanced and imbalanced datasets. Our evaluation results indicate that CNNSplice’s models achieve a better performance compared with existing methods across five organisms’ datasets. In addition, our generality test shows CNNSplice’s model ability to predict and annotate splice sites in new or poorly trained genome datasets indicating a broad application spectrum. CNNSplice demonstrates improved model prediction, interpretability, and generalizability on genomic datasets compared to existing splice site prediction tools. We have developed a web server for the CNNSplice algorithm which can be publicly accessed here: http://www.cnnsplice.online
format Online
Article
Text
id pubmed-10250157
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-102501572023-06-10 CNNSplice: Robust models for splice site prediction using convolutional neural networks Akpokiro, Victor Chowdhury, H. M. A. Mohit Olowofila, Samuel Nusrat, Raisa Oluwadare, Oluwatosin Comput Struct Biotechnol J Method Article The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryotic organisms through protein production and gene expression. Splice site detection tools have been proposed for this purpose; however, the models of these tools have a specific use case and are inefficiently or typically untransferable between organisms. Here, we present CNNSplice, a set of deep convolutional neural network models for splice site prediction. Using the five-fold cross-validation model selection technique, we explore several models based on typical machine learning applications and propose five high-performing models to efficiently predict the true and false SS in balanced and imbalanced datasets. Our evaluation results indicate that CNNSplice’s models achieve a better performance compared with existing methods across five organisms’ datasets. In addition, our generality test shows CNNSplice’s model ability to predict and annotate splice sites in new or poorly trained genome datasets indicating a broad application spectrum. CNNSplice demonstrates improved model prediction, interpretability, and generalizability on genomic datasets compared to existing splice site prediction tools. We have developed a web server for the CNNSplice algorithm which can be publicly accessed here: http://www.cnnsplice.online Research Network of Computational and Structural Biotechnology 2023-05-30 /pmc/articles/PMC10250157/ /pubmed/37304005 http://dx.doi.org/10.1016/j.csbj.2023.05.031 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method Article
Akpokiro, Victor
Chowdhury, H. M. A. Mohit
Olowofila, Samuel
Nusrat, Raisa
Oluwadare, Oluwatosin
CNNSplice: Robust models for splice site prediction using convolutional neural networks
title CNNSplice: Robust models for splice site prediction using convolutional neural networks
title_full CNNSplice: Robust models for splice site prediction using convolutional neural networks
title_fullStr CNNSplice: Robust models for splice site prediction using convolutional neural networks
title_full_unstemmed CNNSplice: Robust models for splice site prediction using convolutional neural networks
title_short CNNSplice: Robust models for splice site prediction using convolutional neural networks
title_sort cnnsplice: robust models for splice site prediction using convolutional neural networks
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10250157/
https://www.ncbi.nlm.nih.gov/pubmed/37304005
http://dx.doi.org/10.1016/j.csbj.2023.05.031
work_keys_str_mv AT akpokirovictor cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks
AT chowdhuryhmamohit cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks
AT olowofilasamuel cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks
AT nusratraisa cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks
AT oluwadareoluwatosin cnnsplicerobustmodelsforsplicesitepredictionusingconvolutionalneuralnetworks