Cargando…

Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach

BACKGROUND: The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavag...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Lun, Li, Zhenfeng, Tang, Zehai, Zhao, Cheng, Zhou, Xi, Hu, Pengwei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9608884/
https://www.ncbi.nlm.nih.gov/pubmed/36303135
http://dx.doi.org/10.1186/s12859-022-04999-y
_version_ 1784818877429972992
author Hu, Lun
Li, Zhenfeng
Tang, Zehai
Zhao, Cheng
Zhou, Xi
Hu, Pengwei
author_facet Hu, Lun
Li, Zhenfeng
Tang, Zehai
Zhao, Cheng
Zhou, Xi
Hu, Pengwei
author_sort Hu, Lun
collection PubMed
description BACKGROUND: The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. RESULTS: In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites.
format Online
Article
Text
id pubmed-9608884
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96088842022-10-28 Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach Hu, Lun Li, Zhenfeng Tang, Zehai Zhao, Cheng Zhou, Xi Hu, Pengwei BMC Bioinformatics Research BACKGROUND: The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. RESULTS: In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites. BioMed Central 2022-10-27 /pmc/articles/PMC9608884/ /pubmed/36303135 http://dx.doi.org/10.1186/s12859-022-04999-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Hu, Lun
Li, Zhenfeng
Tang, Zehai
Zhao, Cheng
Zhou, Xi
Hu, Pengwei
Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach
title Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach
title_full Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach
title_fullStr Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach
title_full_unstemmed Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach
title_short Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach
title_sort effectively predicting hiv-1 protease cleavage sites by using an ensemble learning approach
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9608884/
https://www.ncbi.nlm.nih.gov/pubmed/36303135
http://dx.doi.org/10.1186/s12859-022-04999-y
work_keys_str_mv AT hulun effectivelypredictinghiv1proteasecleavagesitesbyusinganensemblelearningapproach
AT lizhenfeng effectivelypredictinghiv1proteasecleavagesitesbyusinganensemblelearningapproach
AT tangzehai effectivelypredictinghiv1proteasecleavagesitesbyusinganensemblelearningapproach
AT zhaocheng effectivelypredictinghiv1proteasecleavagesitesbyusinganensemblelearningapproach
AT zhouxi effectivelypredictinghiv1proteasecleavagesitesbyusinganensemblelearningapproach
AT hupengwei effectivelypredictinghiv1proteasecleavagesitesbyusinganensemblelearningapproach