Cargando…

Accurately identifying hemagglutinin using sequence information and machine learning methods

INTRODUCTION: Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between the host membrane and the virus. Given its significance in the process of influenza virus infestation, HA has garnered attention as a target for influenza drug and vaccine devel...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Xidan, Ren, Liping, Cai, Peiling, Zhang, Yang, Ding, Hui, Deng, Kejun, Yu, Xiaolong, Lin, Hao, Huang, Chengbing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10644030/
https://www.ncbi.nlm.nih.gov/pubmed/38020152
http://dx.doi.org/10.3389/fmed.2023.1281880
_version_ 1785134464342425600
author Zou, Xidan
Ren, Liping
Cai, Peiling
Zhang, Yang
Ding, Hui
Deng, Kejun
Yu, Xiaolong
Lin, Hao
Huang, Chengbing
author_facet Zou, Xidan
Ren, Liping
Cai, Peiling
Zhang, Yang
Ding, Hui
Deng, Kejun
Yu, Xiaolong
Lin, Hao
Huang, Chengbing
author_sort Zou, Xidan
collection PubMed
description INTRODUCTION: Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between the host membrane and the virus. Given its significance in the process of influenza virus infestation, HA has garnered attention as a target for influenza drug and vaccine development. Thus, accurately identifying HA is crucial for the development of targeted vaccine drugs. However, the identification of HA using in-silico methods is still lacking. This study aims to design a computational model to identify HA. METHODS: In this study, a benchmark dataset comprising 106 HA and 106 non-HA sequences were obtained from UniProt. Various sequence-based features were used to formulate samples. By perform feature optimization and inputting them four kinds of machine learning methods, we constructed an integrated classifier model using the stacking algorithm. RESULTS AND DISCUSSION: The model achieved an accuracy of 95.85% and with an area under the receiver operating characteristic (ROC) curve of 0.9863 in the 5-fold cross-validation. In the independent test, the model exhibited an accuracy of 93.18% and with an area under the ROC curve of 0.9793. The code can be found from https://github.com/Zouxidan/HA_predict.git. The proposed model has excellent prediction performance. The model will provide convenience for biochemical scholars for the study of HA.
format Online
Article
Text
id pubmed-10644030
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106440302023-10-31 Accurately identifying hemagglutinin using sequence information and machine learning methods Zou, Xidan Ren, Liping Cai, Peiling Zhang, Yang Ding, Hui Deng, Kejun Yu, Xiaolong Lin, Hao Huang, Chengbing Front Med (Lausanne) Medicine INTRODUCTION: Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between the host membrane and the virus. Given its significance in the process of influenza virus infestation, HA has garnered attention as a target for influenza drug and vaccine development. Thus, accurately identifying HA is crucial for the development of targeted vaccine drugs. However, the identification of HA using in-silico methods is still lacking. This study aims to design a computational model to identify HA. METHODS: In this study, a benchmark dataset comprising 106 HA and 106 non-HA sequences were obtained from UniProt. Various sequence-based features were used to formulate samples. By perform feature optimization and inputting them four kinds of machine learning methods, we constructed an integrated classifier model using the stacking algorithm. RESULTS AND DISCUSSION: The model achieved an accuracy of 95.85% and with an area under the receiver operating characteristic (ROC) curve of 0.9863 in the 5-fold cross-validation. In the independent test, the model exhibited an accuracy of 93.18% and with an area under the ROC curve of 0.9793. The code can be found from https://github.com/Zouxidan/HA_predict.git. The proposed model has excellent prediction performance. The model will provide convenience for biochemical scholars for the study of HA. Frontiers Media S.A. 2023-10-31 /pmc/articles/PMC10644030/ /pubmed/38020152 http://dx.doi.org/10.3389/fmed.2023.1281880 Text en Copyright © 2023 Zou, Ren, Cai, Zhang, Ding, Deng, Yu, Lin and Huang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Medicine
Zou, Xidan
Ren, Liping
Cai, Peiling
Zhang, Yang
Ding, Hui
Deng, Kejun
Yu, Xiaolong
Lin, Hao
Huang, Chengbing
Accurately identifying hemagglutinin using sequence information and machine learning methods
title Accurately identifying hemagglutinin using sequence information and machine learning methods
title_full Accurately identifying hemagglutinin using sequence information and machine learning methods
title_fullStr Accurately identifying hemagglutinin using sequence information and machine learning methods
title_full_unstemmed Accurately identifying hemagglutinin using sequence information and machine learning methods
title_short Accurately identifying hemagglutinin using sequence information and machine learning methods
title_sort accurately identifying hemagglutinin using sequence information and machine learning methods
topic Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10644030/
https://www.ncbi.nlm.nih.gov/pubmed/38020152
http://dx.doi.org/10.3389/fmed.2023.1281880
work_keys_str_mv AT zouxidan accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT renliping accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT caipeiling accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT zhangyang accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT dinghui accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT dengkejun accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT yuxiaolong accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT linhao accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods
AT huangchengbing accuratelyidentifyinghemagglutininusingsequenceinformationandmachinelearningmethods