Cargando…
Identification of Phage Viral Proteins With Hybrid Sequence Features
The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accuratel...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6443926/ https://www.ncbi.nlm.nih.gov/pubmed/30972038 http://dx.doi.org/10.3389/fmicb.2019.00507 |
_version_ | 1783407926156722176 |
---|---|
author | Ru, Xiaoqing Li, Lihong Wang, Chunyu |
author_facet | Ru, Xiaoqing Li, Lihong Wang, Chunyu |
author_sort | Ru, Xiaoqing |
collection | PubMed |
description | The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research. |
format | Online Article Text |
id | pubmed-6443926 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-64439262019-04-10 Identification of Phage Viral Proteins With Hybrid Sequence Features Ru, Xiaoqing Li, Lihong Wang, Chunyu Front Microbiol Microbiology The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research. Frontiers Media S.A. 2019-03-26 /pmc/articles/PMC6443926/ /pubmed/30972038 http://dx.doi.org/10.3389/fmicb.2019.00507 Text en Copyright © 2019 Ru, Li and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Ru, Xiaoqing Li, Lihong Wang, Chunyu Identification of Phage Viral Proteins With Hybrid Sequence Features |
title | Identification of Phage Viral Proteins With Hybrid Sequence Features |
title_full | Identification of Phage Viral Proteins With Hybrid Sequence Features |
title_fullStr | Identification of Phage Viral Proteins With Hybrid Sequence Features |
title_full_unstemmed | Identification of Phage Viral Proteins With Hybrid Sequence Features |
title_short | Identification of Phage Viral Proteins With Hybrid Sequence Features |
title_sort | identification of phage viral proteins with hybrid sequence features |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6443926/ https://www.ncbi.nlm.nih.gov/pubmed/30972038 http://dx.doi.org/10.3389/fmicb.2019.00507 |
work_keys_str_mv | AT ruxiaoqing identificationofphageviralproteinswithhybridsequencefeatures AT lilihong identificationofphageviralproteinswithhybridsequencefeatures AT wangchunyu identificationofphageviralproteinswithhybridsequencefeatures |