Cargando…

Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods

Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the...

Descripción completa

Detalles Bibliográficos
Autores principales: Qu, Kaiyang, Wei, Leyi, Yu, Jiantao, Wang, Chunyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335366/
https://www.ncbi.nlm.nih.gov/pubmed/30687359
http://dx.doi.org/10.3389/fpls.2018.01961
_version_ 1783387872181616640
author Qu, Kaiyang
Wei, Leyi
Yu, Jiantao
Wang, Chunyu
author_facet Qu, Kaiyang
Wei, Leyi
Yu, Jiantao
Wang, Chunyu
author_sort Qu, Kaiyang
collection PubMed
description Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation. Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results. Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp.
format Online
Article
Text
id pubmed-6335366
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-63353662019-01-25 Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods Qu, Kaiyang Wei, Leyi Yu, Jiantao Wang, Chunyu Front Plant Sci Plant Science Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation. Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results. Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp. Frontiers Media S.A. 2019-01-10 /pmc/articles/PMC6335366/ /pubmed/30687359 http://dx.doi.org/10.3389/fpls.2018.01961 Text en Copyright © 2019 Qu, Wei, Yu and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Qu, Kaiyang
Wei, Leyi
Yu, Jiantao
Wang, Chunyu
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
title Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
title_full Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
title_fullStr Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
title_full_unstemmed Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
title_short Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
title_sort identifying plant pentatricopeptide repeat coding gene/protein using mixed feature extraction methods
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335366/
https://www.ncbi.nlm.nih.gov/pubmed/30687359
http://dx.doi.org/10.3389/fpls.2018.01961
work_keys_str_mv AT qukaiyang identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods
AT weileyi identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods
AT yujiantao identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods
AT wangchunyu identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods