Cargando…

ProAll-D: protein allergen detection using long short term memory - a deep learning approach

BACKGROUND: An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Shanthappa, Pallavi M., Kumar, Rakshitha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Association of Physical Chemists 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484702/
https://www.ncbi.nlm.nih.gov/pubmed/36131892
http://dx.doi.org/10.5599/admet.1335
_version_ 1784791931203616768
author Shanthappa, Pallavi M.
Kumar, Rakshitha
author_facet Shanthappa, Pallavi M.
Kumar, Rakshitha
author_sort Shanthappa, Pallavi M.
collection PubMed
description BACKGROUND: An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict possible allergens. METHOD: This work advocated the use of a deep learning model LSTM (Long Short-Term Memory) to overcome the limitations of traditional approaches and machine learning lower performance models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 (propensity of beta strands) are used to make the variable-length protein sequence to uniform length using ACC transformation. A total of eight machine learning techniques have been taken into consideration. RESULTS: The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %. CONCLUSION: As the LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has been created that successfully identifies novel allergens using the LSTM approach. Users can use the link https://doi.org/10.17632/tjmt97xpjf.1 to access the ProAll-D server and data.
format Online
Article
Text
id pubmed-9484702
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher International Association of Physical Chemists
record_format MEDLINE/PubMed
spelling pubmed-94847022022-09-20 ProAll-D: protein allergen detection using long short term memory - a deep learning approach Shanthappa, Pallavi M. Kumar, Rakshitha ADMET DMPK Original Scientific Paper BACKGROUND: An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict possible allergens. METHOD: This work advocated the use of a deep learning model LSTM (Long Short-Term Memory) to overcome the limitations of traditional approaches and machine learning lower performance models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 (propensity of beta strands) are used to make the variable-length protein sequence to uniform length using ACC transformation. A total of eight machine learning techniques have been taken into consideration. RESULTS: The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %. CONCLUSION: As the LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has been created that successfully identifies novel allergens using the LSTM approach. Users can use the link https://doi.org/10.17632/tjmt97xpjf.1 to access the ProAll-D server and data. International Association of Physical Chemists 2022-09-13 /pmc/articles/PMC9484702/ /pubmed/36131892 http://dx.doi.org/10.5599/admet.1335 Text en Copyright © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Original Scientific Paper
Shanthappa, Pallavi M.
Kumar, Rakshitha
ProAll-D: protein allergen detection using long short term memory - a deep learning approach
title ProAll-D: protein allergen detection using long short term memory - a deep learning approach
title_full ProAll-D: protein allergen detection using long short term memory - a deep learning approach
title_fullStr ProAll-D: protein allergen detection using long short term memory - a deep learning approach
title_full_unstemmed ProAll-D: protein allergen detection using long short term memory - a deep learning approach
title_short ProAll-D: protein allergen detection using long short term memory - a deep learning approach
title_sort proall-d: protein allergen detection using long short term memory - a deep learning approach
topic Original Scientific Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484702/
https://www.ncbi.nlm.nih.gov/pubmed/36131892
http://dx.doi.org/10.5599/admet.1335
work_keys_str_mv AT shanthappapallavim proalldproteinallergendetectionusinglongshorttermmemoryadeeplearningapproach
AT kumarrakshitha proalldproteinallergendetectionusinglongshorttermmemoryadeeplearningapproach