Cargando…

IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models

BACKGROUND: A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Xinyi, Shen, Yueyue, Zhang, Youhua, Liu, Fei, Ma, Zhiyu, Yue, Zhenyu, Yue, Yi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351581/ https://www.ncbi.nlm.nih.gov/pubmed/34434652 http://dx.doi.org/10.7717/peerj.11900

_version_	1783736004202463232
author	Liu, Xinyi Shen, Yueyue Zhang, Youhua Liu, Fei Ma, Zhiyu Yue, Zhenyu Yue, Yi
author_facet	Liu, Xinyi Shen, Yueyue Zhang, Youhua Liu, Fei Ma, Zhiyu Yue, Zhenyu Yue, Yi
author_sort	Liu, Xinyi
collection	PubMed
description	BACKGROUND: A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary. METHODS: This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model. RESULTS: The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/.
format	Online Article Text
id	pubmed-8351581
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-83515812021-08-24 IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models Liu, Xinyi Shen, Yueyue Zhang, Youhua Liu, Fei Ma, Zhiyu Yue, Zhenyu Yue, Yi PeerJ Bioinformatics BACKGROUND: A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary. METHODS: This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model. RESULTS: The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/. PeerJ Inc. 2021-08-06 /pmc/articles/PMC8351581/ /pubmed/34434652 http://dx.doi.org/10.7717/peerj.11900 Text en ©2021 Liu et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Liu, Xinyi Shen, Yueyue Zhang, Youhua Liu, Fei Ma, Zhiyu Yue, Zhenyu Yue, Yi IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models
title	IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models
title_full	IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models
title_fullStr	IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models
title_full_unstemmed	IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models
title_short	IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models
title_sort	identpmp: identification of moonlighting proteins in plants using sequence-based learning models
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351581/ https://www.ncbi.nlm.nih.gov/pubmed/34434652 http://dx.doi.org/10.7717/peerj.11900
work_keys_str_mv	AT liuxinyi identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels AT shenyueyue identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels AT zhangyouhua identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels AT liufei identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels AT mazhiyu identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels AT yuezhenyu identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels AT yueyi identpmpidentificationofmoonlightingproteinsinplantsusingsequencebasedlearningmodels

IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models

Ejemplares similares