Cargando…

Automated prediction of HIV drug resistance from genotype data

BACKGROUND: HIV/AIDS is a serious threat to public health. The emergence of drug resistance mutations diminishes the effectiveness of drug therapy for HIV/AIDS. Developing a computational prediction of drug resistance phenotype will enable efficient and timely selection of the best treatment regimen...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, ChenHsiang, Yu, Xiaxia, Harrison, Robert W., Weber, Irene T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009519/
https://www.ncbi.nlm.nih.gov/pubmed/27586700
http://dx.doi.org/10.1186/s12859-016-1114-6
Descripción
Sumario:BACKGROUND: HIV/AIDS is a serious threat to public health. The emergence of drug resistance mutations diminishes the effectiveness of drug therapy for HIV/AIDS. Developing a computational prediction of drug resistance phenotype will enable efficient and timely selection of the best treatment regimens. RESULTS: A unified encoding of protein sequence and structure was used as the feature vector for predicting phenotypic resistance from genotype data. Two machine learning algorithms, Random Forest and K-nearest neighbor, were used. The prediction accuracies were examined by five-fold cross-validation on the genotype-phenotype datasets. A supervised machine learning approach for automatic prediction of drug resistance was developed to handle genotype-phenotype datasets of HIV protease (PR) and reverse transcriptase (RT). It predicts the drug resistance phenotype and its relative severity from a query sequence. The accuracy of the classification was higher than 0.973 for eight PR inhibitors and 0.986 for ten RT inhibitors, respectively. The overall cross-validated regression R(2)-values for the severity of drug resistance were 0.772–0.953 for 8 PR inhibitors and 0.773–0.995 for 10 RT inhibitors. CONCLUSIONS: Machine learning using a unified encoding of sequence and protein structure as a feature vector provides an accurate prediction of drug resistance from genotype data. A practical webserver for clinicians has been implemented.