Cargando…

Computational prediction of allergenic proteins based on multi-feature fusion

Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Bin, Yang, Ziman, Liu, Qing, Zhang, Ying, Ding, Hui, Lai, Hongyan, Li, Qun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622758/
https://www.ncbi.nlm.nih.gov/pubmed/37928245
http://dx.doi.org/10.3389/fgene.2023.1294159
_version_ 1785130612372275200
author Liu, Bin
Yang, Ziman
Liu, Qing
Zhang, Ying
Ding, Hui
Lai, Hongyan
Li, Qun
author_facet Liu, Bin
Yang, Ziman
Liu, Qing
Zhang, Ying
Ding, Hui
Lai, Hongyan
Li, Qun
author_sort Liu, Bin
collection PubMed
description Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However, developing computational methods to accurately identify new allergenic proteins remains a vital challenge. This work aims to propose a machine learning model based on multi-feature fusion for predicting allergenic proteins efficiently. Firstly, we prepared a benchmark dataset of allergenic and non-allergenic protein sequences and pretested on it with a machine-learning platform. Then, three preferable feature extraction methods, including amino acid composition (AAC), dipeptide composition (DPC) and composition of k-spaced amino acid pairs (CKSAAP) were chosen to extract protein sequence features. Subsequently, these features were fused and optimized by Pearson correlation coefficient (PCC) and principal component analysis (PCA). Finally, the most representative features were picked out to build the optimal predictor based on random forest (RF) algorithm. Performance evaluation results via 5-fold cross-validation showed that the final model, called iAller (https://github.com/laihongyan/iAller), could precisely distinguish allergenic proteins from non-allergenic proteins. The prediction accuracy and AUC value for validation dataset achieved 91.4% and 0.97%, respectively. This model will provide guide for users to identify more allergenic proteins.
format Online
Article
Text
id pubmed-10622758
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106227582023-11-04 Computational prediction of allergenic proteins based on multi-feature fusion Liu, Bin Yang, Ziman Liu, Qing Zhang, Ying Ding, Hui Lai, Hongyan Li, Qun Front Genet Genetics Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However, developing computational methods to accurately identify new allergenic proteins remains a vital challenge. This work aims to propose a machine learning model based on multi-feature fusion for predicting allergenic proteins efficiently. Firstly, we prepared a benchmark dataset of allergenic and non-allergenic protein sequences and pretested on it with a machine-learning platform. Then, three preferable feature extraction methods, including amino acid composition (AAC), dipeptide composition (DPC) and composition of k-spaced amino acid pairs (CKSAAP) were chosen to extract protein sequence features. Subsequently, these features were fused and optimized by Pearson correlation coefficient (PCC) and principal component analysis (PCA). Finally, the most representative features were picked out to build the optimal predictor based on random forest (RF) algorithm. Performance evaluation results via 5-fold cross-validation showed that the final model, called iAller (https://github.com/laihongyan/iAller), could precisely distinguish allergenic proteins from non-allergenic proteins. The prediction accuracy and AUC value for validation dataset achieved 91.4% and 0.97%, respectively. This model will provide guide for users to identify more allergenic proteins. Frontiers Media S.A. 2023-10-19 /pmc/articles/PMC10622758/ /pubmed/37928245 http://dx.doi.org/10.3389/fgene.2023.1294159 Text en Copyright © 2023 Liu, Yang, Liu, Zhang, Ding, Lai and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Liu, Bin
Yang, Ziman
Liu, Qing
Zhang, Ying
Ding, Hui
Lai, Hongyan
Li, Qun
Computational prediction of allergenic proteins based on multi-feature fusion
title Computational prediction of allergenic proteins based on multi-feature fusion
title_full Computational prediction of allergenic proteins based on multi-feature fusion
title_fullStr Computational prediction of allergenic proteins based on multi-feature fusion
title_full_unstemmed Computational prediction of allergenic proteins based on multi-feature fusion
title_short Computational prediction of allergenic proteins based on multi-feature fusion
title_sort computational prediction of allergenic proteins based on multi-feature fusion
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622758/
https://www.ncbi.nlm.nih.gov/pubmed/37928245
http://dx.doi.org/10.3389/fgene.2023.1294159
work_keys_str_mv AT liubin computationalpredictionofallergenicproteinsbasedonmultifeaturefusion
AT yangziman computationalpredictionofallergenicproteinsbasedonmultifeaturefusion
AT liuqing computationalpredictionofallergenicproteinsbasedonmultifeaturefusion
AT zhangying computationalpredictionofallergenicproteinsbasedonmultifeaturefusion
AT dinghui computationalpredictionofallergenicproteinsbasedonmultifeaturefusion
AT laihongyan computationalpredictionofallergenicproteinsbasedonmultifeaturefusion
AT liqun computationalpredictionofallergenicproteinsbasedonmultifeaturefusion