Cargando…

PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework

There is evidence that non-coding RNAs play significant roles in the regulation of nutrient homeostasis, development, and stress responses in plants. Accurate identification of ncRNAs is the first step in determining their function. While a number of machine learning tools have been developed for nc...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Xiaodan, Zhou, Xiaohu, Wan, Midi, Xuan, Jinxiang, Jin, Xiu, Li, Shaowen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570155/
https://www.ncbi.nlm.nih.gov/pubmed/36233123
http://dx.doi.org/10.3390/ijms231911825
_version_ 1784810036288028672
author Zhang, Xiaodan
Zhou, Xiaohu
Wan, Midi
Xuan, Jinxiang
Jin, Xiu
Li, Shaowen
author_facet Zhang, Xiaodan
Zhou, Xiaohu
Wan, Midi
Xuan, Jinxiang
Jin, Xiu
Li, Shaowen
author_sort Zhang, Xiaodan
collection PubMed
description There is evidence that non-coding RNAs play significant roles in the regulation of nutrient homeostasis, development, and stress responses in plants. Accurate identification of ncRNAs is the first step in determining their function. While a number of machine learning tools have been developed for ncRNA identification, no dedicated tool has been developed for ncRNA identification in plants. Here, an automated machine learning tool, PINC is presented to identify ncRNAs in plants using RNA sequences. First, we extracted 91 features from the sequence. Second, we combined the F-test and variance threshold for feature selection to find 10 features. The AutoGluon framework was used to train models for robust identification of non-coding RNAs from datasets constructed for four plant species. Last, these processes were combined into a tool, called PINC, for the identification of plant ncRNAs, which was validated on nine independent test sets, and the accuracy of PINC ranged from 92.74% to 96.42%. As compared with CPC2, CPAT, CPPred, and CNIT, PINC outperformed the other tools in at least five of the eight evaluation indicators. PINC is expected to contribute to identifying and annotating novel ncRNAs in plants.
format Online
Article
Text
id pubmed-9570155
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95701552022-10-17 PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework Zhang, Xiaodan Zhou, Xiaohu Wan, Midi Xuan, Jinxiang Jin, Xiu Li, Shaowen Int J Mol Sci Article There is evidence that non-coding RNAs play significant roles in the regulation of nutrient homeostasis, development, and stress responses in plants. Accurate identification of ncRNAs is the first step in determining their function. While a number of machine learning tools have been developed for ncRNA identification, no dedicated tool has been developed for ncRNA identification in plants. Here, an automated machine learning tool, PINC is presented to identify ncRNAs in plants using RNA sequences. First, we extracted 91 features from the sequence. Second, we combined the F-test and variance threshold for feature selection to find 10 features. The AutoGluon framework was used to train models for robust identification of non-coding RNAs from datasets constructed for four plant species. Last, these processes were combined into a tool, called PINC, for the identification of plant ncRNAs, which was validated on nine independent test sets, and the accuracy of PINC ranged from 92.74% to 96.42%. As compared with CPC2, CPAT, CPPred, and CNIT, PINC outperformed the other tools in at least five of the eight evaluation indicators. PINC is expected to contribute to identifying and annotating novel ncRNAs in plants. MDPI 2022-10-05 /pmc/articles/PMC9570155/ /pubmed/36233123 http://dx.doi.org/10.3390/ijms231911825 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Xiaodan
Zhou, Xiaohu
Wan, Midi
Xuan, Jinxiang
Jin, Xiu
Li, Shaowen
PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
title PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
title_full PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
title_fullStr PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
title_full_unstemmed PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
title_short PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
title_sort pinc: a tool for non-coding rna identification in plants based on an automated machine learning framework
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570155/
https://www.ncbi.nlm.nih.gov/pubmed/36233123
http://dx.doi.org/10.3390/ijms231911825
work_keys_str_mv AT zhangxiaodan pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework
AT zhouxiaohu pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework
AT wanmidi pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework
AT xuanjinxiang pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework
AT jinxiu pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework
AT lishaowen pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework