Cargando…

CNNSplice: Robust models for splice site prediction using convolutional neural networks

The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryot...

Descripción completa

Detalles Bibliográficos
Autores principales: Akpokiro, Victor, Chowdhury, H. M. A. Mohit, Olowofila, Samuel, Nusrat, Raisa, Oluwadare, Oluwatosin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10250157/
https://www.ncbi.nlm.nih.gov/pubmed/37304005
http://dx.doi.org/10.1016/j.csbj.2023.05.031
Descripción
Sumario:The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryotic organisms through protein production and gene expression. Splice site detection tools have been proposed for this purpose; however, the models of these tools have a specific use case and are inefficiently or typically untransferable between organisms. Here, we present CNNSplice, a set of deep convolutional neural network models for splice site prediction. Using the five-fold cross-validation model selection technique, we explore several models based on typical machine learning applications and propose five high-performing models to efficiently predict the true and false SS in balanced and imbalanced datasets. Our evaluation results indicate that CNNSplice’s models achieve a better performance compared with existing methods across five organisms’ datasets. In addition, our generality test shows CNNSplice’s model ability to predict and annotate splice sites in new or poorly trained genome datasets indicating a broad application spectrum. CNNSplice demonstrates improved model prediction, interpretability, and generalizability on genomic datasets compared to existing splice site prediction tools. We have developed a web server for the CNNSplice algorithm which can be publicly accessed here: http://www.cnnsplice.online