Cargando…

ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides

Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattarai, Sadik, Kim, Kyu-Sik, Tayara, Hilal, Chong, Kil To
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9603247/
https://www.ncbi.nlm.nih.gov/pubmed/36293050
http://dx.doi.org/10.3390/ijms232012194
_version_ 1784817503188287488
author Bhattarai, Sadik
Kim, Kyu-Sik
Tayara, Hilal
Chong, Kil To
author_facet Bhattarai, Sadik
Kim, Kyu-Sik
Tayara, Hilal
Chong, Kil To
author_sort Bhattarai, Sadik
collection PubMed
description Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research.
format Online
Article
Text
id pubmed-9603247
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96032472022-10-27 ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides Bhattarai, Sadik Kim, Kyu-Sik Tayara, Hilal Chong, Kil To Int J Mol Sci Article Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research. MDPI 2022-10-13 /pmc/articles/PMC9603247/ /pubmed/36293050 http://dx.doi.org/10.3390/ijms232012194 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Bhattarai, Sadik
Kim, Kyu-Sik
Tayara, Hilal
Chong, Kil To
ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides
title ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides
title_full ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides
title_fullStr ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides
title_full_unstemmed ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides
title_short ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides
title_sort acp-ada: a boosting method with data augmentation for improved prediction of anticancer peptides
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9603247/
https://www.ncbi.nlm.nih.gov/pubmed/36293050
http://dx.doi.org/10.3390/ijms232012194
work_keys_str_mv AT bhattaraisadik acpadaaboostingmethodwithdataaugmentationforimprovedpredictionofanticancerpeptides
AT kimkyusik acpadaaboostingmethodwithdataaugmentationforimprovedpredictionofanticancerpeptides
AT tayarahilal acpadaaboostingmethodwithdataaugmentationforimprovedpredictionofanticancerpeptides
AT chongkilto acpadaaboostingmethodwithdataaugmentationforimprovedpredictionofanticancerpeptides