Cargando…

PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm

Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desir...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhuang, Jujuan, Liu, Danyang, Lin, Meng, Qiu, Wenjing, Liu, Jinyang, Chen, Size
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637112/ https://www.ncbi.nlm.nih.gov/pubmed/34868261 http://dx.doi.org/10.3389/fgene.2021.773882

_version_	1784608676175151104
author	Zhuang, Jujuan Liu, Danyang Lin, Meng Qiu, Wenjing Liu, Jinyang Chen, Size
author_facet	Zhuang, Jujuan Liu, Danyang Lin, Meng Qiu, Wenjing Liu, Jinyang Chen, Size
author_sort	Zhuang, Jujuan
collection	PubMed
description	Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming. Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification. Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.
format	Online Article Text
id	pubmed-8637112
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86371122021-12-03 PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm Zhuang, Jujuan Liu, Danyang Lin, Meng Qiu, Wenjing Liu, Jinyang Chen, Size Front Genet Genetics Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming. Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification. Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep. Frontiers Media S.A. 2021-11-18 /pmc/articles/PMC8637112/ /pubmed/34868261 http://dx.doi.org/10.3389/fgene.2021.773882 Text en Copyright © 2021 Zhuang, Liu, Lin, Qiu, Liu and Chen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Zhuang, Jujuan Liu, Danyang Lin, Meng Qiu, Wenjing Liu, Jinyang Chen, Size PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title	PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_full	PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_fullStr	PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_full_unstemmed	PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_short	PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_sort	pseudeep: rna pseudouridine site identification with deep learning algorithm
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637112/ https://www.ncbi.nlm.nih.gov/pubmed/34868261 http://dx.doi.org/10.3389/fgene.2021.773882
work_keys_str_mv	AT zhuangjujuan pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT liudanyang pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT linmeng pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT qiuwenjing pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT liujinyang pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT chensize pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm

PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm

Ejemplares similares