Cargando…

RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization

Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to t...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Guo-Hua, Wang, Ying, Wang, Guang-Zhong, Yang, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851306/
https://www.ncbi.nlm.nih.gov/pubmed/36464487
http://dx.doi.org/10.1093/bib/bbac509
_version_ 1784872367742255104
author Yuan, Guo-Hua
Wang, Ying
Wang, Guang-Zhong
Yang, Li
author_facet Yuan, Guo-Hua
Wang, Ying
Wang, Guang-Zhong
Yang, Li
author_sort Yuan, Guo-Hua
collection PubMed
description Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.
format Online
Article
Text
id pubmed-9851306
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98513062023-01-20 RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization Yuan, Guo-Hua Wang, Ying Wang, Guang-Zhong Yang, Li Brief Bioinform Problem Solving Protocol Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction. Oxford University Press 2022-12-03 /pmc/articles/PMC9851306/ /pubmed/36464487 http://dx.doi.org/10.1093/bib/bbac509 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Yuan, Guo-Hua
Wang, Ying
Wang, Guang-Zhong
Yang, Li
RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
title RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
title_full RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
title_fullStr RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
title_full_unstemmed RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
title_short RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
title_sort rnalight: a machine learning model to identify nucleotide features determining rna subcellular localization
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851306/
https://www.ncbi.nlm.nih.gov/pubmed/36464487
http://dx.doi.org/10.1093/bib/bbac509
work_keys_str_mv AT yuanguohua rnalightamachinelearningmodeltoidentifynucleotidefeaturesdeterminingrnasubcellularlocalization
AT wangying rnalightamachinelearningmodeltoidentifynucleotidefeaturesdeterminingrnasubcellularlocalization
AT wangguangzhong rnalightamachinelearningmodeltoidentifynucleotidefeaturesdeterminingrnasubcellularlocalization
AT yangli rnalightamachinelearningmodeltoidentifynucleotidefeaturesdeterminingrnasubcellularlocalization