Cargando…

BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction

BACKGROUND AND OBJECTIVE: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficienc...

Descripción completa

Detalles Bibliográficos
Autores principales: Asim, Muhammad Nabeel, Ibrahim, Muhammad Ali, Zehe, Christoph, Trygg, Johan, Dengel, Andreas, Ahmed, Sheraz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Nature Singapore 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581873/
https://www.ncbi.nlm.nih.gov/pubmed/35947255
http://dx.doi.org/10.1007/s12539-022-00535-x
_version_ 1784812725526855680
author Asim, Muhammad Nabeel
Ibrahim, Muhammad Ali
Zehe, Christoph
Trygg, Johan
Dengel, Andreas
Ahmed, Sheraz
author_facet Asim, Muhammad Nabeel
Ibrahim, Muhammad Ali
Zehe, Christoph
Trygg, Johan
Dengel, Andreas
Ahmed, Sheraz
author_sort Asim, Muhammad Nabeel
collection PubMed
description BACKGROUND AND OBJECTIVE: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. METHOD: The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach “Bot-Net” which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA–miRNA interaction prediction. RESULTS: BoT-Net outperforms the state-of-the-art lncRNA–miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA–protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. CONCLUSION: In the benchmark lncRNA–miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA–protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA–miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. AVAILABILITY: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/. GRAPHIC ABSTRACT: [Image: see text]
format Online
Article
Text
id pubmed-9581873
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Nature Singapore
record_format MEDLINE/PubMed
spelling pubmed-95818732022-10-21 BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction Asim, Muhammad Nabeel Ibrahim, Muhammad Ali Zehe, Christoph Trygg, Johan Dengel, Andreas Ahmed, Sheraz Interdiscip Sci Original Research Article BACKGROUND AND OBJECTIVE: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. METHOD: The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach “Bot-Net” which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA–miRNA interaction prediction. RESULTS: BoT-Net outperforms the state-of-the-art lncRNA–miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA–protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. CONCLUSION: In the benchmark lncRNA–miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA–protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA–miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. AVAILABILITY: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/. GRAPHIC ABSTRACT: [Image: see text] Springer Nature Singapore 2022-08-10 2022 /pmc/articles/PMC9581873/ /pubmed/35947255 http://dx.doi.org/10.1007/s12539-022-00535-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Research Article
Asim, Muhammad Nabeel
Ibrahim, Muhammad Ali
Zehe, Christoph
Trygg, Johan
Dengel, Andreas
Ahmed, Sheraz
BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction
title BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction
title_full BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction
title_fullStr BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction
title_full_unstemmed BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction
title_short BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction
title_sort bot-net: a lightweight bag of tricks-based neural network for efficient lncrna–mirna interaction prediction
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581873/
https://www.ncbi.nlm.nih.gov/pubmed/35947255
http://dx.doi.org/10.1007/s12539-022-00535-x
work_keys_str_mv AT asimmuhammadnabeel botnetalightweightbagoftricksbasedneuralnetworkforefficientlncrnamirnainteractionprediction
AT ibrahimmuhammadali botnetalightweightbagoftricksbasedneuralnetworkforefficientlncrnamirnainteractionprediction
AT zehechristoph botnetalightweightbagoftricksbasedneuralnetworkforefficientlncrnamirnainteractionprediction
AT tryggjohan botnetalightweightbagoftricksbasedneuralnetworkforefficientlncrnamirnainteractionprediction
AT dengelandreas botnetalightweightbagoftricksbasedneuralnetworkforefficientlncrnamirnainteractionprediction
AT ahmedsheraz botnetalightweightbagoftricksbasedneuralnetworkforefficientlncrnamirnainteractionprediction