Cargando…

i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification

Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Minchao, Zhang, Renfeng, Xia, Yixiao, Jia, Gangyong, Yin, Yuyu, Wang, Pu, Wu, Jian, Ge, Ruiquan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9091563/
https://www.ncbi.nlm.nih.gov/pubmed/35571057
http://dx.doi.org/10.3389/fgene.2022.884589
_version_ 1784704950392061952
author Jiang, Minchao
Zhang, Renfeng
Xia, Yixiao
Jia, Gangyong
Yin, Yuyu
Wang, Pu
Wu, Jian
Ge, Ruiquan
author_facet Jiang, Minchao
Zhang, Renfeng
Xia, Yixiao
Jia, Gangyong
Yin, Yuyu
Wang, Pu
Wu, Jian
Ge, Ruiquan
author_sort Jiang, Minchao
collection PubMed
description Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a method for large-scale prediction of antiparasitic peptides is urgently needed. We propose a computational approach called i2APP that can efficiently identify APPs using a two-step machine learning (ML) framework. First, in order to solve the imbalance of positive and negative samples in the training set, a random under sampling method is used to generate a balanced training data set. Then, the physical and chemical features and terminus-based features are extracted, and the first classification is performed by Light Gradient Boosting Machine (LGBM) and Support Vector Machine (SVM) to obtain 264-dimensional higher level features. These features are selected by Maximal Information Coefficient (MIC) and the features with the big MIC values are retained. Finally, the SVM algorithm is used for the second classification in the optimized feature space. Thus the prediction model i2APP is fully constructed. On independent datasets, the accuracy and AUC of i2APP are 0.913 and 0.935, respectively, which are better than the state-of-arts methods. The key idea of the proposed method is that multi-level features are extracted from peptide sequences and the higher-level features can distinguish well the APPs and non-APPs.
format Online
Article
Text
id pubmed-9091563
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90915632022-05-12 i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification Jiang, Minchao Zhang, Renfeng Xia, Yixiao Jia, Gangyong Yin, Yuyu Wang, Pu Wu, Jian Ge, Ruiquan Front Genet Genetics Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a method for large-scale prediction of antiparasitic peptides is urgently needed. We propose a computational approach called i2APP that can efficiently identify APPs using a two-step machine learning (ML) framework. First, in order to solve the imbalance of positive and negative samples in the training set, a random under sampling method is used to generate a balanced training data set. Then, the physical and chemical features and terminus-based features are extracted, and the first classification is performed by Light Gradient Boosting Machine (LGBM) and Support Vector Machine (SVM) to obtain 264-dimensional higher level features. These features are selected by Maximal Information Coefficient (MIC) and the features with the big MIC values are retained. Finally, the SVM algorithm is used for the second classification in the optimized feature space. Thus the prediction model i2APP is fully constructed. On independent datasets, the accuracy and AUC of i2APP are 0.913 and 0.935, respectively, which are better than the state-of-arts methods. The key idea of the proposed method is that multi-level features are extracted from peptide sequences and the higher-level features can distinguish well the APPs and non-APPs. Frontiers Media S.A. 2022-04-27 /pmc/articles/PMC9091563/ /pubmed/35571057 http://dx.doi.org/10.3389/fgene.2022.884589 Text en Copyright © 2022 Jiang, Zhang, Xia, Jia, Yin, Wang, Wu and Ge. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Jiang, Minchao
Zhang, Renfeng
Xia, Yixiao
Jia, Gangyong
Yin, Yuyu
Wang, Pu
Wu, Jian
Ge, Ruiquan
i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
title i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
title_full i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
title_fullStr i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
title_full_unstemmed i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
title_short i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
title_sort i2app: a two-step machine learning framework for antiparasitic peptides identification
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9091563/
https://www.ncbi.nlm.nih.gov/pubmed/35571057
http://dx.doi.org/10.3389/fgene.2022.884589
work_keys_str_mv AT jiangminchao i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT zhangrenfeng i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT xiayixiao i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT jiagangyong i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT yinyuyu i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT wangpu i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT wujian i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification
AT geruiquan i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification