Cargando…

Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences

Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. However, their toxicity and allergenicity must be accurately evaluated. It is known that specific amino acid sequences in proteins make some proteins allergic, but many of...

Descripción completa

Detalles Bibliográficos
Autores principales: Goto, Kento, Tamehiro, Norimasa, Yoshida, Takumi, Hanada, Hiroyuki, Sakuma, Takuto, Adachi, Reiko, Kondo, Kazunari, Takeuchi, Ichiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Biochemistry and Molecular Biology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209033/
https://www.ncbi.nlm.nih.gov/pubmed/37086787
http://dx.doi.org/10.1016/j.jbc.2023.104733
_version_ 1785046792835956736
author Goto, Kento
Tamehiro, Norimasa
Yoshida, Takumi
Hanada, Hiroyuki
Sakuma, Takuto
Adachi, Reiko
Kondo, Kazunari
Takeuchi, Ichiro
author_facet Goto, Kento
Tamehiro, Norimasa
Yoshida, Takumi
Hanada, Hiroyuki
Sakuma, Takuto
Adachi, Reiko
Kondo, Kazunari
Takeuchi, Ichiro
author_sort Goto, Kento
collection PubMed
description Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. However, their toxicity and allergenicity must be accurately evaluated. It is known that specific amino acid sequences in proteins make some proteins allergic, but many of these sequences remain uncharacterized. In this study, we introduce a data-driven approach and a machine-learning method to find undiscovered allergen-specific patterns (ASPs) among amino acid sequences. The proposed method enables an exhaustive search for amino acid subsequences whose frequencies are statistically significantly higher in allergenic proteins. As a proof-of-concept, we created a database containing 21,154 proteins of which the presence or absence of allergic reactions are already known and applied the proposed method to the database. The detected ASPs in this proof-of-concept study were consistent with known biological findings, and the allergenicity prediction performance using the detected ASPs was higher than extant approaches, indicating this method may be useful in evaluating the utility of synthetic foods and proteins.
format Online
Article
Text
id pubmed-10209033
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Biochemistry and Molecular Biology
record_format MEDLINE/PubMed
spelling pubmed-102090332023-05-26 Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences Goto, Kento Tamehiro, Norimasa Yoshida, Takumi Hanada, Hiroyuki Sakuma, Takuto Adachi, Reiko Kondo, Kazunari Takeuchi, Ichiro J Biol Chem Research Article Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. However, their toxicity and allergenicity must be accurately evaluated. It is known that specific amino acid sequences in proteins make some proteins allergic, but many of these sequences remain uncharacterized. In this study, we introduce a data-driven approach and a machine-learning method to find undiscovered allergen-specific patterns (ASPs) among amino acid sequences. The proposed method enables an exhaustive search for amino acid subsequences whose frequencies are statistically significantly higher in allergenic proteins. As a proof-of-concept, we created a database containing 21,154 proteins of which the presence or absence of allergic reactions are already known and applied the proposed method to the database. The detected ASPs in this proof-of-concept study were consistent with known biological findings, and the allergenicity prediction performance using the detected ASPs was higher than extant approaches, indicating this method may be useful in evaluating the utility of synthetic foods and proteins. American Society for Biochemistry and Molecular Biology 2023-04-21 /pmc/articles/PMC10209033/ /pubmed/37086787 http://dx.doi.org/10.1016/j.jbc.2023.104733 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Goto, Kento
Tamehiro, Norimasa
Yoshida, Takumi
Hanada, Hiroyuki
Sakuma, Takuto
Adachi, Reiko
Kondo, Kazunari
Takeuchi, Ichiro
Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences
title Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences
title_full Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences
title_fullStr Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences
title_full_unstemmed Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences
title_short Novel machine learning method allerStat identifies statistically significant allergen-specific patterns in protein sequences
title_sort novel machine learning method allerstat identifies statistically significant allergen-specific patterns in protein sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209033/
https://www.ncbi.nlm.nih.gov/pubmed/37086787
http://dx.doi.org/10.1016/j.jbc.2023.104733
work_keys_str_mv AT gotokento novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT tamehironorimasa novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT yoshidatakumi novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT hanadahiroyuki novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT sakumatakuto novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT adachireiko novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT kondokazunari novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences
AT takeuchiichiro novelmachinelearningmethodallerstatidentifiesstatisticallysignificantallergenspecificpatternsinproteinsequences