Cargando…

Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning

Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning schem...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zhenfeng, Hu, Lun, Tang, Zehai, Zhao, Cheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8044780/
https://www.ncbi.nlm.nih.gov/pubmed/33868387
http://dx.doi.org/10.3389/fgene.2021.658078
_version_ 1783678561649950720
author Li, Zhenfeng
Hu, Lun
Tang, Zehai
Zhao, Cheng
author_facet Li, Zhenfeng
Hu, Lun
Tang, Zehai
Zhao, Cheng
author_sort Li, Zhenfeng
collection PubMed
description Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment.
format Online
Article
Text
id pubmed-8044780
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80447802021-04-15 Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning Li, Zhenfeng Hu, Lun Tang, Zehai Zhao, Cheng Front Genet Genetics Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment. Frontiers Media S.A. 2021-03-26 /pmc/articles/PMC8044780/ /pubmed/33868387 http://dx.doi.org/10.3389/fgene.2021.658078 Text en Copyright © 2021 Li, Hu, Tang and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Li, Zhenfeng
Hu, Lun
Tang, Zehai
Zhao, Cheng
Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
title Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
title_full Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
title_fullStr Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
title_full_unstemmed Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
title_short Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
title_sort predicting hiv-1 protease cleavage sites with positive-unlabeled learning
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8044780/
https://www.ncbi.nlm.nih.gov/pubmed/33868387
http://dx.doi.org/10.3389/fgene.2021.658078
work_keys_str_mv AT lizhenfeng predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning
AT hulun predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning
AT tangzehai predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning
AT zhaocheng predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning