Cargando…
Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning schem...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8044780/ https://www.ncbi.nlm.nih.gov/pubmed/33868387 http://dx.doi.org/10.3389/fgene.2021.658078 |
_version_ | 1783678561649950720 |
---|---|
author | Li, Zhenfeng Hu, Lun Tang, Zehai Zhao, Cheng |
author_facet | Li, Zhenfeng Hu, Lun Tang, Zehai Zhao, Cheng |
author_sort | Li, Zhenfeng |
collection | PubMed |
description | Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment. |
format | Online Article Text |
id | pubmed-8044780 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80447802021-04-15 Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning Li, Zhenfeng Hu, Lun Tang, Zehai Zhao, Cheng Front Genet Genetics Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment. Frontiers Media S.A. 2021-03-26 /pmc/articles/PMC8044780/ /pubmed/33868387 http://dx.doi.org/10.3389/fgene.2021.658078 Text en Copyright © 2021 Li, Hu, Tang and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Li, Zhenfeng Hu, Lun Tang, Zehai Zhao, Cheng Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning |
title | Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning |
title_full | Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning |
title_fullStr | Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning |
title_full_unstemmed | Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning |
title_short | Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning |
title_sort | predicting hiv-1 protease cleavage sites with positive-unlabeled learning |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8044780/ https://www.ncbi.nlm.nih.gov/pubmed/33868387 http://dx.doi.org/10.3389/fgene.2021.658078 |
work_keys_str_mv | AT lizhenfeng predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning AT hulun predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning AT tangzehai predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning AT zhaocheng predictinghiv1proteasecleavagesiteswithpositiveunlabeledlearning |