Cargando…

Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method

Motivation: Pentatricopeptide repeat (PPR), which is a triangular pentapeptide repeat domain, plays an important role in plant growth. Features extracted from sequences are applicable to PPR protein identification using certain classification methods. However, which components of a multidimensional...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xudong, Wang, Hanxu, Li, Hangyu, Wu, Yiming, Wang, Guohua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7957076/
https://www.ncbi.nlm.nih.gov/pubmed/33732270
http://dx.doi.org/10.3389/fpls.2021.506681
_version_ 1783664580025647104
author Zhao, Xudong
Wang, Hanxu
Li, Hangyu
Wu, Yiming
Wang, Guohua
author_facet Zhao, Xudong
Wang, Hanxu
Li, Hangyu
Wu, Yiming
Wang, Guohua
author_sort Zhao, Xudong
collection PubMed
description Motivation: Pentatricopeptide repeat (PPR), which is a triangular pentapeptide repeat domain, plays an important role in plant growth. Features extracted from sequences are applicable to PPR protein identification using certain classification methods. However, which components of a multidimensional feature (namely variables) are more effective for protein discrimination has never been discussed. Therefore, we seek to select variables from a multidimensional feature for identifying PPR proteins. Method: A framework of variable selection for identifying PPR proteins is proposed. Samples representing PPR positive proteins and negative ones are equally split into a training and a testing set. Variable importance is regarded as scores derived from an iteration of resampling, training, and scoring step on the training set. A model selection method based on Gaussian mixture model is applied to automatic choice of variables which are effective to identify PPR proteins. Measurements are used on the testing set to show the effectiveness of the selected variables. Results: Certain variables other than the multidimensional feature they belong to do work for discrimination between PPR positive proteins and those negative ones. In addition, the content of methionine may play an important role in predicting PPR proteins.
format Online
Article
Text
id pubmed-7957076
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79570762021-03-16 Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method Zhao, Xudong Wang, Hanxu Li, Hangyu Wu, Yiming Wang, Guohua Front Plant Sci Plant Science Motivation: Pentatricopeptide repeat (PPR), which is a triangular pentapeptide repeat domain, plays an important role in plant growth. Features extracted from sequences are applicable to PPR protein identification using certain classification methods. However, which components of a multidimensional feature (namely variables) are more effective for protein discrimination has never been discussed. Therefore, we seek to select variables from a multidimensional feature for identifying PPR proteins. Method: A framework of variable selection for identifying PPR proteins is proposed. Samples representing PPR positive proteins and negative ones are equally split into a training and a testing set. Variable importance is regarded as scores derived from an iteration of resampling, training, and scoring step on the training set. A model selection method based on Gaussian mixture model is applied to automatic choice of variables which are effective to identify PPR proteins. Measurements are used on the testing set to show the effectiveness of the selected variables. Results: Certain variables other than the multidimensional feature they belong to do work for discrimination between PPR positive proteins and those negative ones. In addition, the content of methionine may play an important role in predicting PPR proteins. Frontiers Media S.A. 2021-03-01 /pmc/articles/PMC7957076/ /pubmed/33732270 http://dx.doi.org/10.3389/fpls.2021.506681 Text en Copyright © 2021 Zhao, Wang, Li, Wu and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Zhao, Xudong
Wang, Hanxu
Li, Hangyu
Wu, Yiming
Wang, Guohua
Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method
title Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method
title_full Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method
title_fullStr Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method
title_full_unstemmed Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method
title_short Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method
title_sort identifying plant pentatricopeptide repeat proteins using a variable selection method
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7957076/
https://www.ncbi.nlm.nih.gov/pubmed/33732270
http://dx.doi.org/10.3389/fpls.2021.506681
work_keys_str_mv AT zhaoxudong identifyingplantpentatricopeptiderepeatproteinsusingavariableselectionmethod
AT wanghanxu identifyingplantpentatricopeptiderepeatproteinsusingavariableselectionmethod
AT lihangyu identifyingplantpentatricopeptiderepeatproteinsusingavariableselectionmethod
AT wuyiming identifyingplantpentatricopeptiderepeatproteinsusingavariableselectionmethod
AT wangguohua identifyingplantpentatricopeptiderepeatproteinsusingavariableselectionmethod