Cargando…
HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels
BACKGROUND: Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonethe...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637298/ https://www.ncbi.nlm.nih.gov/pubmed/19055831 http://dx.doi.org/10.1186/1742-4690-5-110 |
_version_ | 1782164351043502080 |
---|---|
author | Boisvert, Sébastien Marchand, Mario Laviolette, François Corbeil, Jacques |
author_facet | Boisvert, Sébastien Marchand, Mario Laviolette, François Corbeil, Jacques |
author_sort | Boisvert, Sébastien |
collection | PubMed |
description | BACKGROUND: Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel. RESULTS: Three string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at . CONCLUSION: We examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described. |
format | Text |
id | pubmed-2637298 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26372982009-02-09 HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels Boisvert, Sébastien Marchand, Mario Laviolette, François Corbeil, Jacques Retrovirology Research BACKGROUND: Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel. RESULTS: Three string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at . CONCLUSION: We examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described. BioMed Central 2008-12-04 /pmc/articles/PMC2637298/ /pubmed/19055831 http://dx.doi.org/10.1186/1742-4690-5-110 Text en Copyright © 2008 Boisvert et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Boisvert, Sébastien Marchand, Mario Laviolette, François Corbeil, Jacques HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
title | HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
title_full | HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
title_fullStr | HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
title_full_unstemmed | HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
title_short | HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
title_sort | hiv-1 coreceptor usage prediction without multiple alignments: an application of string kernels |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637298/ https://www.ncbi.nlm.nih.gov/pubmed/19055831 http://dx.doi.org/10.1186/1742-4690-5-110 |
work_keys_str_mv | AT boisvertsebastien hiv1coreceptorusagepredictionwithoutmultiplealignmentsanapplicationofstringkernels AT marchandmario hiv1coreceptorusagepredictionwithoutmultiplealignmentsanapplicationofstringkernels AT laviolettefrancois hiv1coreceptorusagepredictionwithoutmultiplealignmentsanapplicationofstringkernels AT corbeiljacques hiv1coreceptorusagepredictionwithoutmultiplealignmentsanapplicationofstringkernels |