Cargando…
i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification
Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a metho...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9091563/ https://www.ncbi.nlm.nih.gov/pubmed/35571057 http://dx.doi.org/10.3389/fgene.2022.884589 |
_version_ | 1784704950392061952 |
---|---|
author | Jiang, Minchao Zhang, Renfeng Xia, Yixiao Jia, Gangyong Yin, Yuyu Wang, Pu Wu, Jian Ge, Ruiquan |
author_facet | Jiang, Minchao Zhang, Renfeng Xia, Yixiao Jia, Gangyong Yin, Yuyu Wang, Pu Wu, Jian Ge, Ruiquan |
author_sort | Jiang, Minchao |
collection | PubMed |
description | Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a method for large-scale prediction of antiparasitic peptides is urgently needed. We propose a computational approach called i2APP that can efficiently identify APPs using a two-step machine learning (ML) framework. First, in order to solve the imbalance of positive and negative samples in the training set, a random under sampling method is used to generate a balanced training data set. Then, the physical and chemical features and terminus-based features are extracted, and the first classification is performed by Light Gradient Boosting Machine (LGBM) and Support Vector Machine (SVM) to obtain 264-dimensional higher level features. These features are selected by Maximal Information Coefficient (MIC) and the features with the big MIC values are retained. Finally, the SVM algorithm is used for the second classification in the optimized feature space. Thus the prediction model i2APP is fully constructed. On independent datasets, the accuracy and AUC of i2APP are 0.913 and 0.935, respectively, which are better than the state-of-arts methods. The key idea of the proposed method is that multi-level features are extracted from peptide sequences and the higher-level features can distinguish well the APPs and non-APPs. |
format | Online Article Text |
id | pubmed-9091563 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-90915632022-05-12 i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification Jiang, Minchao Zhang, Renfeng Xia, Yixiao Jia, Gangyong Yin, Yuyu Wang, Pu Wu, Jian Ge, Ruiquan Front Genet Genetics Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a method for large-scale prediction of antiparasitic peptides is urgently needed. We propose a computational approach called i2APP that can efficiently identify APPs using a two-step machine learning (ML) framework. First, in order to solve the imbalance of positive and negative samples in the training set, a random under sampling method is used to generate a balanced training data set. Then, the physical and chemical features and terminus-based features are extracted, and the first classification is performed by Light Gradient Boosting Machine (LGBM) and Support Vector Machine (SVM) to obtain 264-dimensional higher level features. These features are selected by Maximal Information Coefficient (MIC) and the features with the big MIC values are retained. Finally, the SVM algorithm is used for the second classification in the optimized feature space. Thus the prediction model i2APP is fully constructed. On independent datasets, the accuracy and AUC of i2APP are 0.913 and 0.935, respectively, which are better than the state-of-arts methods. The key idea of the proposed method is that multi-level features are extracted from peptide sequences and the higher-level features can distinguish well the APPs and non-APPs. Frontiers Media S.A. 2022-04-27 /pmc/articles/PMC9091563/ /pubmed/35571057 http://dx.doi.org/10.3389/fgene.2022.884589 Text en Copyright © 2022 Jiang, Zhang, Xia, Jia, Yin, Wang, Wu and Ge. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Jiang, Minchao Zhang, Renfeng Xia, Yixiao Jia, Gangyong Yin, Yuyu Wang, Pu Wu, Jian Ge, Ruiquan i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification |
title | i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification |
title_full | i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification |
title_fullStr | i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification |
title_full_unstemmed | i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification |
title_short | i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification |
title_sort | i2app: a two-step machine learning framework for antiparasitic peptides identification |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9091563/ https://www.ncbi.nlm.nih.gov/pubmed/35571057 http://dx.doi.org/10.3389/fgene.2022.884589 |
work_keys_str_mv | AT jiangminchao i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT zhangrenfeng i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT xiayixiao i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT jiagangyong i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT yinyuyu i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT wangpu i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT wujian i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification AT geruiquan i2appatwostepmachinelearningframeworkforantiparasiticpeptidesidentification |