Cargando…

Characterizing RNA Pseudouridylation by Convolutional Neural Networks

Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms, and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has no...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Xuan, Zhang, Sai, Zhang, Yanqing, Lei, Zhixin, Jiang, Tao, Zeng, Jianyang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9170758/
https://www.ncbi.nlm.nih.gov/pubmed/33631424
http://dx.doi.org/10.1016/j.gpb.2019.11.015
_version_ 1784721505941192704
author He, Xuan
Zhang, Sai
Zhang, Yanqing
Lei, Zhixin
Jiang, Tao
Zeng, Jianyang
author_facet He, Xuan
Zhang, Sai
Zhang, Yanqing
Lei, Zhixin
Jiang, Tao
Zeng, Jianyang
author_sort He, Xuan
collection PubMed
description Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms, and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has not yet been fully delineated. Here, we present a highly effective model based on a convolutional neural network (CNN), called PseudoUridyLation Site Estimator (PULSE), to analyze large-scale profiling data of Ψ sites and characterize the contextual sequence features of pseudouridylation. PULSE, consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network, can automatically learn the hidden patterns of pseudouridylation from the local sequence information. Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy, thus enabling us to further characterize the transcriptome-wide landscape of Ψ sites. We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation, such as the regulations of RNA secondary structure, codon usage, translation, and RNA stability, and the connection to single nucleotide variants. The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.
format Online
Article
Text
id pubmed-9170758
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-91707582022-06-08 Characterizing RNA Pseudouridylation by Convolutional Neural Networks He, Xuan Zhang, Sai Zhang, Yanqing Lei, Zhixin Jiang, Tao Zeng, Jianyang Genomics Proteomics Bioinformatics Method Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms, and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has not yet been fully delineated. Here, we present a highly effective model based on a convolutional neural network (CNN), called PseudoUridyLation Site Estimator (PULSE), to analyze large-scale profiling data of Ψ sites and characterize the contextual sequence features of pseudouridylation. PULSE, consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network, can automatically learn the hidden patterns of pseudouridylation from the local sequence information. Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy, thus enabling us to further characterize the transcriptome-wide landscape of Ψ sites. We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation, such as the regulations of RNA secondary structure, codon usage, translation, and RNA stability, and the connection to single nucleotide variants. The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE. Elsevier 2021-10 2021-02-23 /pmc/articles/PMC9170758/ /pubmed/33631424 http://dx.doi.org/10.1016/j.gpb.2019.11.015 Text en © 2021 The Author https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
He, Xuan
Zhang, Sai
Zhang, Yanqing
Lei, Zhixin
Jiang, Tao
Zeng, Jianyang
Characterizing RNA Pseudouridylation by Convolutional Neural Networks
title Characterizing RNA Pseudouridylation by Convolutional Neural Networks
title_full Characterizing RNA Pseudouridylation by Convolutional Neural Networks
title_fullStr Characterizing RNA Pseudouridylation by Convolutional Neural Networks
title_full_unstemmed Characterizing RNA Pseudouridylation by Convolutional Neural Networks
title_short Characterizing RNA Pseudouridylation by Convolutional Neural Networks
title_sort characterizing rna pseudouridylation by convolutional neural networks
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9170758/
https://www.ncbi.nlm.nih.gov/pubmed/33631424
http://dx.doi.org/10.1016/j.gpb.2019.11.015
work_keys_str_mv AT hexuan characterizingrnapseudouridylationbyconvolutionalneuralnetworks
AT zhangsai characterizingrnapseudouridylationbyconvolutionalneuralnetworks
AT zhangyanqing characterizingrnapseudouridylationbyconvolutionalneuralnetworks
AT leizhixin characterizingrnapseudouridylationbyconvolutionalneuralnetworks
AT jiangtao characterizingrnapseudouridylationbyconvolutionalneuralnetworks
AT zengjianyang characterizingrnapseudouridylationbyconvolutionalneuralnetworks