Cargando…

Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors

BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the trea...

Descripción completa

Detalles Bibliográficos
Autores principales: Onah, Emmanuel, Uzor, Philip F., Ugwoke, Ikenna Calvin, Eze, Jude Uche, Ugwuanyi, Sunday Tochukwu, Chukwudi, Ifeanyi Richard, Ibezim, Akachukwu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9641908/
https://www.ncbi.nlm.nih.gov/pubmed/36344934
http://dx.doi.org/10.1186/s12859-022-05017-x
_version_ 1784826187231526912
author Onah, Emmanuel
Uzor, Philip F.
Ugwoke, Ikenna Calvin
Eze, Jude Uche
Ugwuanyi, Sunday Tochukwu
Chukwudi, Ifeanyi Richard
Ibezim, Akachukwu
author_facet Onah, Emmanuel
Uzor, Philip F.
Ugwoke, Ikenna Calvin
Eze, Jude Uche
Ugwuanyi, Sunday Tochukwu
Chukwudi, Ifeanyi Richard
Ibezim, Akachukwu
author_sort Onah, Emmanuel
collection PubMed
description BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. RESULTS: Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). CONCLUSIONS: Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05017-x.
format Online
Article
Text
id pubmed-9641908
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96419082022-11-15 Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors Onah, Emmanuel Uzor, Philip F. Ugwoke, Ikenna Calvin Eze, Jude Uche Ugwuanyi, Sunday Tochukwu Chukwudi, Ifeanyi Richard Ibezim, Akachukwu BMC Bioinformatics Research BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. RESULTS: Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). CONCLUSIONS: Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05017-x. BioMed Central 2022-11-08 /pmc/articles/PMC9641908/ /pubmed/36344934 http://dx.doi.org/10.1186/s12859-022-05017-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Onah, Emmanuel
Uzor, Philip F.
Ugwoke, Ikenna Calvin
Eze, Jude Uche
Ugwuanyi, Sunday Tochukwu
Chukwudi, Ifeanyi Richard
Ibezim, Akachukwu
Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_full Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_fullStr Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_full_unstemmed Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_short Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_sort prediction of hiv-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9641908/
https://www.ncbi.nlm.nih.gov/pubmed/36344934
http://dx.doi.org/10.1186/s12859-022-05017-x
work_keys_str_mv AT onahemmanuel predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT uzorphilipf predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT ugwokeikennacalvin predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT ezejudeuche predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT ugwuanyisundaytochukwu predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT chukwudiifeanyirichard predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT ibezimakachukwu predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors