Cargando…

Identification of Phage Viral Proteins With Hybrid Sequence Features

The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accuratel...

Descripción completa

Detalles Bibliográficos
Autores principales: Ru, Xiaoqing, Li, Lihong, Wang, Chunyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6443926/
https://www.ncbi.nlm.nih.gov/pubmed/30972038
http://dx.doi.org/10.3389/fmicb.2019.00507
_version_ 1783407926156722176
author Ru, Xiaoqing
Li, Lihong
Wang, Chunyu
author_facet Ru, Xiaoqing
Li, Lihong
Wang, Chunyu
author_sort Ru, Xiaoqing
collection PubMed
description The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.
format Online
Article
Text
id pubmed-6443926
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-64439262019-04-10 Identification of Phage Viral Proteins With Hybrid Sequence Features Ru, Xiaoqing Li, Lihong Wang, Chunyu Front Microbiol Microbiology The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research. Frontiers Media S.A. 2019-03-26 /pmc/articles/PMC6443926/ /pubmed/30972038 http://dx.doi.org/10.3389/fmicb.2019.00507 Text en Copyright © 2019 Ru, Li and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Ru, Xiaoqing
Li, Lihong
Wang, Chunyu
Identification of Phage Viral Proteins With Hybrid Sequence Features
title Identification of Phage Viral Proteins With Hybrid Sequence Features
title_full Identification of Phage Viral Proteins With Hybrid Sequence Features
title_fullStr Identification of Phage Viral Proteins With Hybrid Sequence Features
title_full_unstemmed Identification of Phage Viral Proteins With Hybrid Sequence Features
title_short Identification of Phage Viral Proteins With Hybrid Sequence Features
title_sort identification of phage viral proteins with hybrid sequence features
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6443926/
https://www.ncbi.nlm.nih.gov/pubmed/30972038
http://dx.doi.org/10.3389/fmicb.2019.00507
work_keys_str_mv AT ruxiaoqing identificationofphageviralproteinswithhybridsequencefeatures
AT lilihong identificationofphageviralproteinswithhybridsequencefeatures
AT wangchunyu identificationofphageviralproteinswithhybridsequencefeatures