Cargando…
MSLP: mRNA subcellular localization predictor based on machine learning techniques
BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefor...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035125/ https://www.ncbi.nlm.nih.gov/pubmed/36949389 http://dx.doi.org/10.1186/s12859-023-05232-0 |
_version_ | 1784911353212829696 |
---|---|
author | Musleh, Saleh Islam, Mohammad Tariqul Qureshi, Rizwan Alajez, Nehad M. Alam, Tanvir |
author_facet | Musleh, Saleh Islam, Mohammad Tariqul Qureshi, Rizwan Alajez, Nehad M. Alam, Tanvir |
author_sort | Musleh, Saleh |
collection | PubMed |
description | BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05232-0. |
format | Online Article Text |
id | pubmed-10035125 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100351252023-03-24 MSLP: mRNA subcellular localization predictor based on machine learning techniques Musleh, Saleh Islam, Mohammad Tariqul Qureshi, Rizwan Alajez, Nehad M. Alam, Tanvir BMC Bioinformatics Research BACKGROUND: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. METHODS: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. RESULTS: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. AVAILABILITY: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05232-0. BioMed Central 2023-03-22 /pmc/articles/PMC10035125/ /pubmed/36949389 http://dx.doi.org/10.1186/s12859-023-05232-0 Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Musleh, Saleh Islam, Mohammad Tariqul Qureshi, Rizwan Alajez, Nehad M. Alam, Tanvir MSLP: mRNA subcellular localization predictor based on machine learning techniques |
title | MSLP: mRNA subcellular localization predictor based on machine learning techniques |
title_full | MSLP: mRNA subcellular localization predictor based on machine learning techniques |
title_fullStr | MSLP: mRNA subcellular localization predictor based on machine learning techniques |
title_full_unstemmed | MSLP: mRNA subcellular localization predictor based on machine learning techniques |
title_short | MSLP: mRNA subcellular localization predictor based on machine learning techniques |
title_sort | mslp: mrna subcellular localization predictor based on machine learning techniques |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035125/ https://www.ncbi.nlm.nih.gov/pubmed/36949389 http://dx.doi.org/10.1186/s12859-023-05232-0 |
work_keys_str_mv | AT muslehsaleh mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques AT islammohammadtariqul mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques AT qureshirizwan mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques AT alajeznehadm mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques AT alamtanvir mslpmrnasubcellularlocalizationpredictorbasedonmachinelearningtechniques |