Cargando…

MSLP: mRNA subcellular localization predictor based on machine learning techniques

BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefor...

Descripción completa

Detalles Bibliográficos
Autores principales: Musleh, Saleh, Islam, Mohammad Tariqul, Qureshi, Rizwan, Alajez, Nehad M., Alam, Tanvir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035125/
https://www.ncbi.nlm.nih.gov/pubmed/36949389
http://dx.doi.org/10.1186/s12859-023-05232-0
_version_ 1784911353212829696
author Musleh, Saleh
Islam, Mohammad Tariqul
Qureshi, Rizwan
Alajez, Nehad M.
Alam, Tanvir
author_facet Musleh, Saleh
Islam, Mohammad Tariqul
Qureshi, Rizwan
Alajez, Nehad M.
Alam, Tanvir
author_sort Musleh, Saleh
collection PubMed
description BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method  in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05232-0.
format Online
Article
Text
id pubmed-10035125
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100351252023-03-24 MSLP: mRNA subcellular localization predictor based on machine learning techniques Musleh, Saleh Islam, Mohammad Tariqul Qureshi, Rizwan Alajez, Nehad M. Alam, Tanvir BMC Bioinformatics Research BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method  in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05232-0. BioMed Central 2023-03-22 /pmc/articles/PMC10035125/ /pubmed/36949389 http://dx.doi.org/10.1186/s12859-023-05232-0 Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Musleh, Saleh
Islam, Mohammad Tariqul
Qureshi, Rizwan
Alajez, Nehad M.
Alam, Tanvir
MSLP: mRNA subcellular localization predictor based on machine learning techniques
title MSLP: mRNA subcellular localization predictor based on machine learning techniques
title_full MSLP: mRNA subcellular localization predictor based on machine learning techniques
title_fullStr MSLP: mRNA subcellular localization predictor based on machine learning techniques
title_full_unstemmed MSLP: mRNA subcellular localization predictor based on machine learning techniques
title_short MSLP: mRNA subcellular localization predictor based on machine learning techniques
title_sort mslp: mrna subcellular localization predictor based on machine learning techniques
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035125/
https://www.ncbi.nlm.nih.gov/pubmed/36949389
http://dx.doi.org/10.1186/s12859-023-05232-0
work_keys_str_mv AT muslehsaleh mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques
AT islammohammadtariqul mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques
AT qureshirizwan mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques
AT alajeznehadm mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques
AT alamtanvir mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques