Cargando…
PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework
There is evidence that non-coding RNAs play significant roles in the regulation of nutrient homeostasis, development, and stress responses in plants. Accurate identification of ncRNAs is the first step in determining their function. While a number of machine learning tools have been developed for nc...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570155/ https://www.ncbi.nlm.nih.gov/pubmed/36233123 http://dx.doi.org/10.3390/ijms231911825 |
_version_ | 1784810036288028672 |
---|---|
author | Zhang, Xiaodan Zhou, Xiaohu Wan, Midi Xuan, Jinxiang Jin, Xiu Li, Shaowen |
author_facet | Zhang, Xiaodan Zhou, Xiaohu Wan, Midi Xuan, Jinxiang Jin, Xiu Li, Shaowen |
author_sort | Zhang, Xiaodan |
collection | PubMed |
description | There is evidence that non-coding RNAs play significant roles in the regulation of nutrient homeostasis, development, and stress responses in plants. Accurate identification of ncRNAs is the first step in determining their function. While a number of machine learning tools have been developed for ncRNA identification, no dedicated tool has been developed for ncRNA identification in plants. Here, an automated machine learning tool, PINC is presented to identify ncRNAs in plants using RNA sequences. First, we extracted 91 features from the sequence. Second, we combined the F-test and variance threshold for feature selection to find 10 features. The AutoGluon framework was used to train models for robust identification of non-coding RNAs from datasets constructed for four plant species. Last, these processes were combined into a tool, called PINC, for the identification of plant ncRNAs, which was validated on nine independent test sets, and the accuracy of PINC ranged from 92.74% to 96.42%. As compared with CPC2, CPAT, CPPred, and CNIT, PINC outperformed the other tools in at least five of the eight evaluation indicators. PINC is expected to contribute to identifying and annotating novel ncRNAs in plants. |
format | Online Article Text |
id | pubmed-9570155 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95701552022-10-17 PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework Zhang, Xiaodan Zhou, Xiaohu Wan, Midi Xuan, Jinxiang Jin, Xiu Li, Shaowen Int J Mol Sci Article There is evidence that non-coding RNAs play significant roles in the regulation of nutrient homeostasis, development, and stress responses in plants. Accurate identification of ncRNAs is the first step in determining their function. While a number of machine learning tools have been developed for ncRNA identification, no dedicated tool has been developed for ncRNA identification in plants. Here, an automated machine learning tool, PINC is presented to identify ncRNAs in plants using RNA sequences. First, we extracted 91 features from the sequence. Second, we combined the F-test and variance threshold for feature selection to find 10 features. The AutoGluon framework was used to train models for robust identification of non-coding RNAs from datasets constructed for four plant species. Last, these processes were combined into a tool, called PINC, for the identification of plant ncRNAs, which was validated on nine independent test sets, and the accuracy of PINC ranged from 92.74% to 96.42%. As compared with CPC2, CPAT, CPPred, and CNIT, PINC outperformed the other tools in at least five of the eight evaluation indicators. PINC is expected to contribute to identifying and annotating novel ncRNAs in plants. MDPI 2022-10-05 /pmc/articles/PMC9570155/ /pubmed/36233123 http://dx.doi.org/10.3390/ijms231911825 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Xiaodan Zhou, Xiaohu Wan, Midi Xuan, Jinxiang Jin, Xiu Li, Shaowen PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework |
title | PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework |
title_full | PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework |
title_fullStr | PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework |
title_full_unstemmed | PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework |
title_short | PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework |
title_sort | pinc: a tool for non-coding rna identification in plants based on an automated machine learning framework |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570155/ https://www.ncbi.nlm.nih.gov/pubmed/36233123 http://dx.doi.org/10.3390/ijms231911825 |
work_keys_str_mv | AT zhangxiaodan pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework AT zhouxiaohu pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework AT wanmidi pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework AT xuanjinxiang pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework AT jinxiu pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework AT lishaowen pincatoolfornoncodingrnaidentificationinplantsbasedonanautomatedmachinelearningframework |