Cargando…
Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the trea...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9641908/ https://www.ncbi.nlm.nih.gov/pubmed/36344934 http://dx.doi.org/10.1186/s12859-022-05017-x |
_version_ | 1784826187231526912 |
---|---|
author | Onah, Emmanuel Uzor, Philip F. Ugwoke, Ikenna Calvin Eze, Jude Uche Ugwuanyi, Sunday Tochukwu Chukwudi, Ifeanyi Richard Ibezim, Akachukwu |
author_facet | Onah, Emmanuel Uzor, Philip F. Ugwoke, Ikenna Calvin Eze, Jude Uche Ugwuanyi, Sunday Tochukwu Chukwudi, Ifeanyi Richard Ibezim, Akachukwu |
author_sort | Onah, Emmanuel |
collection | PubMed |
description | BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. RESULTS: Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). CONCLUSIONS: Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05017-x. |
format | Online Article Text |
id | pubmed-9641908 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-96419082022-11-15 Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors Onah, Emmanuel Uzor, Philip F. Ugwoke, Ikenna Calvin Eze, Jude Uche Ugwuanyi, Sunday Tochukwu Chukwudi, Ifeanyi Richard Ibezim, Akachukwu BMC Bioinformatics Research BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. RESULTS: Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). CONCLUSIONS: Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05017-x. BioMed Central 2022-11-08 /pmc/articles/PMC9641908/ /pubmed/36344934 http://dx.doi.org/10.1186/s12859-022-05017-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Onah, Emmanuel Uzor, Philip F. Ugwoke, Ikenna Calvin Eze, Jude Uche Ugwuanyi, Sunday Tochukwu Chukwudi, Ifeanyi Richard Ibezim, Akachukwu Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
title | Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
title_full | Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
title_fullStr | Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
title_full_unstemmed | Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
title_short | Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
title_sort | prediction of hiv-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9641908/ https://www.ncbi.nlm.nih.gov/pubmed/36344934 http://dx.doi.org/10.1186/s12859-022-05017-x |
work_keys_str_mv | AT onahemmanuel predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors AT uzorphilipf predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors AT ugwokeikennacalvin predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors AT ezejudeuche predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors AT ugwuanyisundaytochukwu predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors AT chukwudiifeanyirichard predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors AT ibezimakachukwu predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors |