Cargando…
Splice Junction Identification using Long Short-Term Memory Neural Networks
BACKGROUND: Splice junctions are the key to move from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi-exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Bentham Science Publishers
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844938/ https://www.ncbi.nlm.nih.gov/pubmed/35283668 http://dx.doi.org/10.2174/1389202922666211011143008 |
_version_ | 1784651574442721280 |
---|---|
author | Regan, Kevin Saghafi, Abolfazl Li, Zhijun |
author_facet | Regan, Kevin Saghafi, Abolfazl Li, Zhijun |
author_sort | Regan, Kevin |
collection | PubMed |
description | BACKGROUND: Splice junctions are the key to move from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi-exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic with important implications. OBJECTIVE: The aim of this paper is to develop a deep learning model capable of identifying splice junctions in RNA sequences using 13,666 unique sequences of primate RNA. METHODS: A Long Short-Term Memory (LSTM) Neural Network model is developed that classifies a given sequence as EI (Exon-Intron splice), IE (Intron-Exon splice), or N (No splice). The model is trained with groups of trinucleotides and its performance is tested using validation and test data to prevent bias. RESULTS: Model performance was measured using accuracy and f-score in test data. The finalized model achieved an average accuracy of 91.34% with an average f-score of 91.36% over 50 runs. CONCLUSION: Comparisons show a highly competitive model to recent Convolutional Neural Network structures. The proposed LSTM model achieves the highest accuracy and f-score among published alternative LSTM structures. |
format | Online Article Text |
id | pubmed-8844938 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Bentham Science Publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-88449382022-06-30 Splice Junction Identification using Long Short-Term Memory Neural Networks Regan, Kevin Saghafi, Abolfazl Li, Zhijun Curr Genomics Article BACKGROUND: Splice junctions are the key to move from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi-exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic with important implications. OBJECTIVE: The aim of this paper is to develop a deep learning model capable of identifying splice junctions in RNA sequences using 13,666 unique sequences of primate RNA. METHODS: A Long Short-Term Memory (LSTM) Neural Network model is developed that classifies a given sequence as EI (Exon-Intron splice), IE (Intron-Exon splice), or N (No splice). The model is trained with groups of trinucleotides and its performance is tested using validation and test data to prevent bias. RESULTS: Model performance was measured using accuracy and f-score in test data. The finalized model achieved an average accuracy of 91.34% with an average f-score of 91.36% over 50 runs. CONCLUSION: Comparisons show a highly competitive model to recent Convolutional Neural Network structures. The proposed LSTM model achieves the highest accuracy and f-score among published alternative LSTM structures. Bentham Science Publishers 2021-12-30 2021-12-30 /pmc/articles/PMC8844938/ /pubmed/35283668 http://dx.doi.org/10.2174/1389202922666211011143008 Text en © 2021 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/ This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited. |
spellingShingle | Article Regan, Kevin Saghafi, Abolfazl Li, Zhijun Splice Junction Identification using Long Short-Term Memory Neural Networks |
title | Splice Junction Identification using Long Short-Term Memory Neural Networks |
title_full | Splice Junction Identification using Long Short-Term Memory Neural Networks |
title_fullStr | Splice Junction Identification using Long Short-Term Memory Neural Networks |
title_full_unstemmed | Splice Junction Identification using Long Short-Term Memory Neural Networks |
title_short | Splice Junction Identification using Long Short-Term Memory Neural Networks |
title_sort | splice junction identification using long short-term memory neural networks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844938/ https://www.ncbi.nlm.nih.gov/pubmed/35283668 http://dx.doi.org/10.2174/1389202922666211011143008 |
work_keys_str_mv | AT regankevin splicejunctionidentificationusinglongshorttermmemoryneuralnetworks AT saghafiabolfazl splicejunctionidentificationusinglongshorttermmemoryneuralnetworks AT lizhijun splicejunctionidentificationusinglongshorttermmemoryneuralnetworks |