Cargando…

Using amino acid features to identify the pathogenicity of influenza B virus

BACKGROUND: Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus. METHODS: The dataset included all 11 influenza virus pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Kou, Zheng, Fan, Xinyue, Li, Junjie, Shao, Zehui, Qiang, Xiaoli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9066401/
https://www.ncbi.nlm.nih.gov/pubmed/35509019
http://dx.doi.org/10.1186/s40249-022-00974-0
_version_ 1784699796077936640
author Kou, Zheng
Fan, Xinyue
Li, Junjie
Shao, Zehui
Qiang, Xiaoli
author_facet Kou, Zheng
Fan, Xinyue
Li, Junjie
Shao, Zehui
Qiang, Xiaoli
author_sort Kou, Zheng
collection PubMed
description BACKGROUND: Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus. METHODS: The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification. RESULTS: The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method. CONCLUSIONS: The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40249-022-00974-0.
format Online
Article
Text
id pubmed-9066401
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90664012022-05-04 Using amino acid features to identify the pathogenicity of influenza B virus Kou, Zheng Fan, Xinyue Li, Junjie Shao, Zehui Qiang, Xiaoli Infect Dis Poverty Research Article BACKGROUND: Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus. METHODS: The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification. RESULTS: The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method. CONCLUSIONS: The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40249-022-00974-0. BioMed Central 2022-05-04 /pmc/articles/PMC9066401/ /pubmed/35509019 http://dx.doi.org/10.1186/s40249-022-00974-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Kou, Zheng
Fan, Xinyue
Li, Junjie
Shao, Zehui
Qiang, Xiaoli
Using amino acid features to identify the pathogenicity of influenza B virus
title Using amino acid features to identify the pathogenicity of influenza B virus
title_full Using amino acid features to identify the pathogenicity of influenza B virus
title_fullStr Using amino acid features to identify the pathogenicity of influenza B virus
title_full_unstemmed Using amino acid features to identify the pathogenicity of influenza B virus
title_short Using amino acid features to identify the pathogenicity of influenza B virus
title_sort using amino acid features to identify the pathogenicity of influenza b virus
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9066401/
https://www.ncbi.nlm.nih.gov/pubmed/35509019
http://dx.doi.org/10.1186/s40249-022-00974-0
work_keys_str_mv AT kouzheng usingaminoacidfeaturestoidentifythepathogenicityofinfluenzabvirus
AT fanxinyue usingaminoacidfeaturestoidentifythepathogenicityofinfluenzabvirus
AT lijunjie usingaminoacidfeaturestoidentifythepathogenicityofinfluenzabvirus
AT shaozehui usingaminoacidfeaturestoidentifythepathogenicityofinfluenzabvirus
AT qiangxiaoli usingaminoacidfeaturestoidentifythepathogenicityofinfluenzabvirus