Cargando…
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335366/ https://www.ncbi.nlm.nih.gov/pubmed/30687359 http://dx.doi.org/10.3389/fpls.2018.01961 |
_version_ | 1783387872181616640 |
---|---|
author | Qu, Kaiyang Wei, Leyi Yu, Jiantao Wang, Chunyu |
author_facet | Qu, Kaiyang Wei, Leyi Yu, Jiantao Wang, Chunyu |
author_sort | Qu, Kaiyang |
collection | PubMed |
description | Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation. Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results. Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp. |
format | Online Article Text |
id | pubmed-6335366 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-63353662019-01-25 Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods Qu, Kaiyang Wei, Leyi Yu, Jiantao Wang, Chunyu Front Plant Sci Plant Science Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation. Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results. Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp. Frontiers Media S.A. 2019-01-10 /pmc/articles/PMC6335366/ /pubmed/30687359 http://dx.doi.org/10.3389/fpls.2018.01961 Text en Copyright © 2019 Qu, Wei, Yu and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Qu, Kaiyang Wei, Leyi Yu, Jiantao Wang, Chunyu Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods |
title | Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods |
title_full | Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods |
title_fullStr | Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods |
title_full_unstemmed | Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods |
title_short | Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods |
title_sort | identifying plant pentatricopeptide repeat coding gene/protein using mixed feature extraction methods |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335366/ https://www.ncbi.nlm.nih.gov/pubmed/30687359 http://dx.doi.org/10.3389/fpls.2018.01961 |
work_keys_str_mv | AT qukaiyang identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods AT weileyi identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods AT yujiantao identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods AT wangchunyu identifyingplantpentatricopeptiderepeatcodinggeneproteinusingmixedfeatureextractionmethods |