Cargando…

A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia

The identification of nucleotide sequence variations in viral pathogens linked to disease and clinical outcomes is important for developing vaccines and therapies. However, identifying these genetic variations in rapidly evolving pathogens adapting to selection pressures unique to each host presents...

Descripción completa

Detalles Bibliográficos
Autores principales: Holman, Alexander G., Gabuzda, Dana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3498126/
https://www.ncbi.nlm.nih.gov/pubmed/23166702
http://dx.doi.org/10.1371/journal.pone.0049538
_version_ 1782249786942947328
author Holman, Alexander G.
Gabuzda, Dana
author_facet Holman, Alexander G.
Gabuzda, Dana
author_sort Holman, Alexander G.
collection PubMed
description The identification of nucleotide sequence variations in viral pathogens linked to disease and clinical outcomes is important for developing vaccines and therapies. However, identifying these genetic variations in rapidly evolving pathogens adapting to selection pressures unique to each host presents several challenges. Machine learning tools provide new opportunities to address these challenges. In HIV infection, virus replicating within the brain causes HIV-associated dementia (HAD) and milder forms of neurocognitive impairment in 20–30% of patients with unsuppressed viremia. HIV neurotropism is primarily determined by the viral envelope (env) gene. To identify amino acid signatures in the HIV env gene predictive of HAD, we developed a machine learning pipeline using the PART rule-learning algorithm and C4.5 decision tree inducer to train a classifier on a meta-dataset (n = 860 env sequences from 78 patients: 40 HAD, 38 non-HAD). To increase the flexibility and biological relevance of our analysis, we included 4 numeric factors describing amino acid hydrophobicity, polarity, bulkiness, and charge, in addition to amino acid identities. The classifier had 75% predictive accuracy in leave-one-out cross-validation, and identified 5 signatures associated with HAD diagnosis (p<0.05, Fisher’s exact test). These HAD signatures were found in the majority of brain sequences from 8 of 10 HAD patients from an independent cohort. Additionally, 2 HAD signatures were validated against env sequences from CSF of a second independent cohort. This analysis provides insight into viral genetic determinants associated with HAD, and develops novel methods for applying machine learning tools to analyze the genetics of rapidly evolving pathogens.
format Online
Article
Text
id pubmed-3498126
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34981262012-11-19 A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia Holman, Alexander G. Gabuzda, Dana PLoS One Research Article The identification of nucleotide sequence variations in viral pathogens linked to disease and clinical outcomes is important for developing vaccines and therapies. However, identifying these genetic variations in rapidly evolving pathogens adapting to selection pressures unique to each host presents several challenges. Machine learning tools provide new opportunities to address these challenges. In HIV infection, virus replicating within the brain causes HIV-associated dementia (HAD) and milder forms of neurocognitive impairment in 20–30% of patients with unsuppressed viremia. HIV neurotropism is primarily determined by the viral envelope (env) gene. To identify amino acid signatures in the HIV env gene predictive of HAD, we developed a machine learning pipeline using the PART rule-learning algorithm and C4.5 decision tree inducer to train a classifier on a meta-dataset (n = 860 env sequences from 78 patients: 40 HAD, 38 non-HAD). To increase the flexibility and biological relevance of our analysis, we included 4 numeric factors describing amino acid hydrophobicity, polarity, bulkiness, and charge, in addition to amino acid identities. The classifier had 75% predictive accuracy in leave-one-out cross-validation, and identified 5 signatures associated with HAD diagnosis (p<0.05, Fisher’s exact test). These HAD signatures were found in the majority of brain sequences from 8 of 10 HAD patients from an independent cohort. Additionally, 2 HAD signatures were validated against env sequences from CSF of a second independent cohort. This analysis provides insight into viral genetic determinants associated with HAD, and develops novel methods for applying machine learning tools to analyze the genetics of rapidly evolving pathogens. Public Library of Science 2012-11-14 /pmc/articles/PMC3498126/ /pubmed/23166702 http://dx.doi.org/10.1371/journal.pone.0049538 Text en © 2012 Holman, Gabuzda http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Holman, Alexander G.
Gabuzda, Dana
A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia
title A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia
title_full A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia
title_fullStr A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia
title_full_unstemmed A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia
title_short A Machine Learning Approach for Identifying Amino Acid Signatures in the HIV Env Gene Predictive of Dementia
title_sort machine learning approach for identifying amino acid signatures in the hiv env gene predictive of dementia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3498126/
https://www.ncbi.nlm.nih.gov/pubmed/23166702
http://dx.doi.org/10.1371/journal.pone.0049538
work_keys_str_mv AT holmanalexanderg amachinelearningapproachforidentifyingaminoacidsignaturesinthehivenvgenepredictiveofdementia
AT gabuzdadana amachinelearningapproachforidentifyingaminoacidsignaturesinthehivenvgenepredictiveofdementia
AT holmanalexanderg machinelearningapproachforidentifyingaminoacidsignaturesinthehivenvgenepredictiveofdementia
AT gabuzdadana machinelearningapproachforidentifyingaminoacidsignaturesinthehivenvgenepredictiveofdementia