Cargando…

HIV drug resistance prediction with weighted categorical kernel functions

BACKGROUND: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an o...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramon, Elies, Belanche-Muñoz, Lluís, Pérez-Enciso, Miguel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668108/
https://www.ncbi.nlm.nih.gov/pubmed/31362714
http://dx.doi.org/10.1186/s12859-019-2991-2
_version_ 1783440158208557056
author Ramon, Elies
Belanche-Muñoz, Lluís
Pérez-Enciso, Miguel
author_facet Ramon, Elies
Belanche-Muñoz, Lluís
Pérez-Enciso, Miguel
author_sort Ramon, Elies
collection PubMed
description BACKGROUND: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. RESULTS: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. CONCLUSIONS: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2991-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6668108
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66681082019-08-05 HIV drug resistance prediction with weighted categorical kernel functions Ramon, Elies Belanche-Muñoz, Lluís Pérez-Enciso, Miguel BMC Bioinformatics Research Article BACKGROUND: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. RESULTS: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. CONCLUSIONS: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2991-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-30 /pmc/articles/PMC6668108/ /pubmed/31362714 http://dx.doi.org/10.1186/s12859-019-2991-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ramon, Elies
Belanche-Muñoz, Lluís
Pérez-Enciso, Miguel
HIV drug resistance prediction with weighted categorical kernel functions
title HIV drug resistance prediction with weighted categorical kernel functions
title_full HIV drug resistance prediction with weighted categorical kernel functions
title_fullStr HIV drug resistance prediction with weighted categorical kernel functions
title_full_unstemmed HIV drug resistance prediction with weighted categorical kernel functions
title_short HIV drug resistance prediction with weighted categorical kernel functions
title_sort hiv drug resistance prediction with weighted categorical kernel functions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668108/
https://www.ncbi.nlm.nih.gov/pubmed/31362714
http://dx.doi.org/10.1186/s12859-019-2991-2
work_keys_str_mv AT ramonelies hivdrugresistancepredictionwithweightedcategoricalkernelfunctions
AT belanchemunozlluis hivdrugresistancepredictionwithweightedcategoricalkernelfunctions
AT perezencisomiguel hivdrugresistancepredictionwithweightedcategoricalkernelfunctions