Cargando…
Computational prediction of allergenic proteins based on multi-feature fusion
Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However,...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622758/ https://www.ncbi.nlm.nih.gov/pubmed/37928245 http://dx.doi.org/10.3389/fgene.2023.1294159 |
_version_ | 1785130612372275200 |
---|---|
author | Liu, Bin Yang, Ziman Liu, Qing Zhang, Ying Ding, Hui Lai, Hongyan Li, Qun |
author_facet | Liu, Bin Yang, Ziman Liu, Qing Zhang, Ying Ding, Hui Lai, Hongyan Li, Qun |
author_sort | Liu, Bin |
collection | PubMed |
description | Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However, developing computational methods to accurately identify new allergenic proteins remains a vital challenge. This work aims to propose a machine learning model based on multi-feature fusion for predicting allergenic proteins efficiently. Firstly, we prepared a benchmark dataset of allergenic and non-allergenic protein sequences and pretested on it with a machine-learning platform. Then, three preferable feature extraction methods, including amino acid composition (AAC), dipeptide composition (DPC) and composition of k-spaced amino acid pairs (CKSAAP) were chosen to extract protein sequence features. Subsequently, these features were fused and optimized by Pearson correlation coefficient (PCC) and principal component analysis (PCA). Finally, the most representative features were picked out to build the optimal predictor based on random forest (RF) algorithm. Performance evaluation results via 5-fold cross-validation showed that the final model, called iAller (https://github.com/laihongyan/iAller), could precisely distinguish allergenic proteins from non-allergenic proteins. The prediction accuracy and AUC value for validation dataset achieved 91.4% and 0.97%, respectively. This model will provide guide for users to identify more allergenic proteins. |
format | Online Article Text |
id | pubmed-10622758 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-106227582023-11-04 Computational prediction of allergenic proteins based on multi-feature fusion Liu, Bin Yang, Ziman Liu, Qing Zhang, Ying Ding, Hui Lai, Hongyan Li, Qun Front Genet Genetics Allergy is an autoimmune disorder described as an undesirable response of the immune system to typically innocuous substance in the environment. Studies have shown that the ability of proteins to trigger allergic reactions in susceptible individuals can be evaluated by bioinformatics tools. However, developing computational methods to accurately identify new allergenic proteins remains a vital challenge. This work aims to propose a machine learning model based on multi-feature fusion for predicting allergenic proteins efficiently. Firstly, we prepared a benchmark dataset of allergenic and non-allergenic protein sequences and pretested on it with a machine-learning platform. Then, three preferable feature extraction methods, including amino acid composition (AAC), dipeptide composition (DPC) and composition of k-spaced amino acid pairs (CKSAAP) were chosen to extract protein sequence features. Subsequently, these features were fused and optimized by Pearson correlation coefficient (PCC) and principal component analysis (PCA). Finally, the most representative features were picked out to build the optimal predictor based on random forest (RF) algorithm. Performance evaluation results via 5-fold cross-validation showed that the final model, called iAller (https://github.com/laihongyan/iAller), could precisely distinguish allergenic proteins from non-allergenic proteins. The prediction accuracy and AUC value for validation dataset achieved 91.4% and 0.97%, respectively. This model will provide guide for users to identify more allergenic proteins. Frontiers Media S.A. 2023-10-19 /pmc/articles/PMC10622758/ /pubmed/37928245 http://dx.doi.org/10.3389/fgene.2023.1294159 Text en Copyright © 2023 Liu, Yang, Liu, Zhang, Ding, Lai and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Liu, Bin Yang, Ziman Liu, Qing Zhang, Ying Ding, Hui Lai, Hongyan Li, Qun Computational prediction of allergenic proteins based on multi-feature fusion |
title | Computational prediction of allergenic proteins based on multi-feature fusion |
title_full | Computational prediction of allergenic proteins based on multi-feature fusion |
title_fullStr | Computational prediction of allergenic proteins based on multi-feature fusion |
title_full_unstemmed | Computational prediction of allergenic proteins based on multi-feature fusion |
title_short | Computational prediction of allergenic proteins based on multi-feature fusion |
title_sort | computational prediction of allergenic proteins based on multi-feature fusion |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622758/ https://www.ncbi.nlm.nih.gov/pubmed/37928245 http://dx.doi.org/10.3389/fgene.2023.1294159 |
work_keys_str_mv | AT liubin computationalpredictionofallergenicproteinsbasedonmultifeaturefusion AT yangziman computationalpredictionofallergenicproteinsbasedonmultifeaturefusion AT liuqing computationalpredictionofallergenicproteinsbasedonmultifeaturefusion AT zhangying computationalpredictionofallergenicproteinsbasedonmultifeaturefusion AT dinghui computationalpredictionofallergenicproteinsbasedonmultifeaturefusion AT laihongyan computationalpredictionofallergenicproteinsbasedonmultifeaturefusion AT liqun computationalpredictionofallergenicproteinsbasedonmultifeaturefusion |