Cargando…

CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model

Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition tec...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Pengpeng, Nie, Zhenjun, Huang, Zhiqiang, Zhang, Xiaodan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10143888/
https://www.ncbi.nlm.nih.gov/pubmed/37111874
http://dx.doi.org/10.3390/plants12081652
_version_ 1785033967728066560
author Wu, Pengpeng
Nie, Zhenjun
Huang, Zhiqiang
Zhang, Xiaodan
author_facet Wu, Pengpeng
Nie, Zhenjun
Huang, Zhiqiang
Zhang, Xiaodan
author_sort Wu, Pengpeng
collection PubMed
description Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition technologies. However, the sequence features of plant circRNAs differ from those of animal circRNAs, making it impossible to detect plant circRNAs. For example, there are non-GT/AG splicing signals at circRNA junction sites and few reverse complementary sequences and repetitive elements in the flanking intron sequences of plant circRNAs. In addition, there have been few studies on circRNAs in plants, and thus it is urgent to create a plant-specific method for identifying circRNAs. In this study, we propose CircPCBL, a deep-learning approach that only uses raw sequences to distinguish between circRNAs found in plants and other lncRNAs. CircPCBL comprises two separate detectors: a CNN-BiGRU detector and a GLT detector. The CNN-BiGRU detector takes in the one-hot encoding of the RNA sequence as the input, while the GLT detector uses k-mer (k = 1 − 4) features. The output matrices of the two submodels are then concatenated and ultimately pass through a fully connected layer to produce the final output. To verify the generalization performance of the model, we evaluated CircPCBL using several datasets, and the results revealed that it had an F1 of 85.40% on the validation dataset composed of six different plants species and 85.88%, 75.87%, and 86.83% on the three cross-species independent test sets composed of Cucumis sativus, Populus trichocarpa, and Gossypium raimondii, respectively. With an accuracy of 90.9% and 90%, respectively, CircPCBL successfully predicted ten of the eleven circRNAs of experimentally reported Poncirus trifoliata and nine of the ten lncRNAs of rice on the real set. CircPCBL could potentially contribute to the identification of circRNAs in plants. In addition, it is remarkable that CircPCBL also achieved an average accuracy of 94.08% on the human datasets, which is also an excellent result, implying its potential application in animal datasets. Ultimately, CircPCBL is available as a web server, from which the data and source code can also be downloaded free of charge.
format Online
Article
Text
id pubmed-10143888
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101438882023-04-29 CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model Wu, Pengpeng Nie, Zhenjun Huang, Zhiqiang Zhang, Xiaodan Plants (Basel) Article Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition technologies. However, the sequence features of plant circRNAs differ from those of animal circRNAs, making it impossible to detect plant circRNAs. For example, there are non-GT/AG splicing signals at circRNA junction sites and few reverse complementary sequences and repetitive elements in the flanking intron sequences of plant circRNAs. In addition, there have been few studies on circRNAs in plants, and thus it is urgent to create a plant-specific method for identifying circRNAs. In this study, we propose CircPCBL, a deep-learning approach that only uses raw sequences to distinguish between circRNAs found in plants and other lncRNAs. CircPCBL comprises two separate detectors: a CNN-BiGRU detector and a GLT detector. The CNN-BiGRU detector takes in the one-hot encoding of the RNA sequence as the input, while the GLT detector uses k-mer (k = 1 − 4) features. The output matrices of the two submodels are then concatenated and ultimately pass through a fully connected layer to produce the final output. To verify the generalization performance of the model, we evaluated CircPCBL using several datasets, and the results revealed that it had an F1 of 85.40% on the validation dataset composed of six different plants species and 85.88%, 75.87%, and 86.83% on the three cross-species independent test sets composed of Cucumis sativus, Populus trichocarpa, and Gossypium raimondii, respectively. With an accuracy of 90.9% and 90%, respectively, CircPCBL successfully predicted ten of the eleven circRNAs of experimentally reported Poncirus trifoliata and nine of the ten lncRNAs of rice on the real set. CircPCBL could potentially contribute to the identification of circRNAs in plants. In addition, it is remarkable that CircPCBL also achieved an average accuracy of 94.08% on the human datasets, which is also an excellent result, implying its potential application in animal datasets. Ultimately, CircPCBL is available as a web server, from which the data and source code can also be downloaded free of charge. MDPI 2023-04-14 /pmc/articles/PMC10143888/ /pubmed/37111874 http://dx.doi.org/10.3390/plants12081652 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wu, Pengpeng
Nie, Zhenjun
Huang, Zhiqiang
Zhang, Xiaodan
CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model
title CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model
title_full CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model
title_fullStr CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model
title_full_unstemmed CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model
title_short CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model
title_sort circpcbl: identification of plant circrnas with a cnn-bigru-glt model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10143888/
https://www.ncbi.nlm.nih.gov/pubmed/37111874
http://dx.doi.org/10.3390/plants12081652
work_keys_str_mv AT wupengpeng circpcblidentificationofplantcircrnaswithacnnbigrugltmodel
AT niezhenjun circpcblidentificationofplantcircrnaswithacnnbigrugltmodel
AT huangzhiqiang circpcblidentificationofplantcircrnaswithacnnbigrugltmodel
AT zhangxiaodan circpcblidentificationofplantcircrnaswithacnnbigrugltmodel