Cargando…

Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations

BACKGROUND: Linear regression models are used to quantitatively predict drug resistance, the phenotype, from the HIV-1 viral genotype. As new antiretroviral drugs become available, new resistance pathways emerge and the number of resistance associated mutations continues to increase. To accurately i...

Descripción completa

Detalles Bibliográficos
Autores principales: Van der Borght, Koen, Van Craenenbroeck, Elke, Lecocq, Pierre, Van Houtte, Margriet, Van Kerckhove, Barbara, Bacheler, Lee, Verbeke, Geert, van Vlijmen, Herman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223907/
https://www.ncbi.nlm.nih.gov/pubmed/21966893
http://dx.doi.org/10.1186/1471-2105-12-386
_version_ 1782217320038400000
author Van der Borght, Koen
Van Craenenbroeck, Elke
Lecocq, Pierre
Van Houtte, Margriet
Van Kerckhove, Barbara
Bacheler, Lee
Verbeke, Geert
van Vlijmen, Herman
author_facet Van der Borght, Koen
Van Craenenbroeck, Elke
Lecocq, Pierre
Van Houtte, Margriet
Van Kerckhove, Barbara
Bacheler, Lee
Verbeke, Geert
van Vlijmen, Herman
author_sort Van der Borght, Koen
collection PubMed
description BACKGROUND: Linear regression models are used to quantitatively predict drug resistance, the phenotype, from the HIV-1 viral genotype. As new antiretroviral drugs become available, new resistance pathways emerge and the number of resistance associated mutations continues to increase. To accurately identify which drug options are left, the main goal of the modeling has been to maximize predictivity and not interpretability. However, we originally selected linear regression as the preferred method for its transparency as opposed to other techniques such as neural networks. Here, we apply a method to lower the complexity of these phenotype prediction models using a 3-fold cross-validated selection of mutations. RESULTS: Compared to standard stepwise regression we were able to reduce the number of mutations in the reverse transcriptase (RT) inhibitor models as well as the number of interaction terms accounting for synergistic and antagonistic effects. This reduction in complexity was most significant for the non-nucleoside reverse transcriptase inhibitor (NNRTI) models, while maintaining prediction accuracy and retaining virtually all known resistance associated mutations as first order terms in the models. Furthermore, for etravirine (ETR) a better performance was seen on two years of unseen data. By analyzing the phenotype prediction models we identified a list of forty novel NNRTI mutations, putatively associated with resistance. The resistance association of novel variants at known NNRTI resistance positions: 100, 101, 181, 190, 221 and of mutations at positions not previously linked with NNRTI resistance: 102, 139, 219, 241, 376 and 382 was confirmed by phenotyping site-directed mutants. CONCLUSIONS: We successfully identified and validated novel NNRTI resistance associated mutations by developing parsimonious resistance prediction models in which repeated cross-validation within the stepwise regression was applied. Our model selection technique is computationally feasible for large data sets and provides an approach to the continued identification of resistance-causing mutations.
format Online
Article
Text
id pubmed-3223907
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32239072011-11-30 Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations Van der Borght, Koen Van Craenenbroeck, Elke Lecocq, Pierre Van Houtte, Margriet Van Kerckhove, Barbara Bacheler, Lee Verbeke, Geert van Vlijmen, Herman BMC Bioinformatics Research Article BACKGROUND: Linear regression models are used to quantitatively predict drug resistance, the phenotype, from the HIV-1 viral genotype. As new antiretroviral drugs become available, new resistance pathways emerge and the number of resistance associated mutations continues to increase. To accurately identify which drug options are left, the main goal of the modeling has been to maximize predictivity and not interpretability. However, we originally selected linear regression as the preferred method for its transparency as opposed to other techniques such as neural networks. Here, we apply a method to lower the complexity of these phenotype prediction models using a 3-fold cross-validated selection of mutations. RESULTS: Compared to standard stepwise regression we were able to reduce the number of mutations in the reverse transcriptase (RT) inhibitor models as well as the number of interaction terms accounting for synergistic and antagonistic effects. This reduction in complexity was most significant for the non-nucleoside reverse transcriptase inhibitor (NNRTI) models, while maintaining prediction accuracy and retaining virtually all known resistance associated mutations as first order terms in the models. Furthermore, for etravirine (ETR) a better performance was seen on two years of unseen data. By analyzing the phenotype prediction models we identified a list of forty novel NNRTI mutations, putatively associated with resistance. The resistance association of novel variants at known NNRTI resistance positions: 100, 101, 181, 190, 221 and of mutations at positions not previously linked with NNRTI resistance: 102, 139, 219, 241, 376 and 382 was confirmed by phenotyping site-directed mutants. CONCLUSIONS: We successfully identified and validated novel NNRTI resistance associated mutations by developing parsimonious resistance prediction models in which repeated cross-validation within the stepwise regression was applied. Our model selection technique is computationally feasible for large data sets and provides an approach to the continued identification of resistance-causing mutations. BioMed Central 2011-10-03 /pmc/articles/PMC3223907/ /pubmed/21966893 http://dx.doi.org/10.1186/1471-2105-12-386 Text en Copyright ©2011 Van der Borght et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Van der Borght, Koen
Van Craenenbroeck, Elke
Lecocq, Pierre
Van Houtte, Margriet
Van Kerckhove, Barbara
Bacheler, Lee
Verbeke, Geert
van Vlijmen, Herman
Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
title Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
title_full Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
title_fullStr Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
title_full_unstemmed Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
title_short Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
title_sort cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223907/
https://www.ncbi.nlm.nih.gov/pubmed/21966893
http://dx.doi.org/10.1186/1471-2105-12-386
work_keys_str_mv AT vanderborghtkoen crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT vancraenenbroeckelke crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT lecocqpierre crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT vanhouttemargriet crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT vankerckhovebarbara crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT bachelerlee crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT verbekegeert crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations
AT vanvlijmenherman crossvalidatedstepwiseregressionforidentificationofnovelnonnucleosidereversetranscriptaseinhibitorresistanceassociatedmutations