Cargando…

Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors

BACKGROUND: Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censo...

Descripción completa

Detalles Bibliográficos
Autores principales: Soret, Perrine, Avalos, Marta, Wittkop, Linda, Commenges, Daniel, Thiébaut, Rodolphe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280495/
https://www.ncbi.nlm.nih.gov/pubmed/30514234
http://dx.doi.org/10.1186/s12874-018-0609-4
_version_ 1783378689761738752
author Soret, Perrine
Avalos, Marta
Wittkop, Linda
Commenges, Daniel
Thiébaut, Rodolphe
author_facet Soret, Perrine
Avalos, Marta
Wittkop, Linda
Commenges, Daniel
Thiébaut, Rodolphe
author_sort Soret, Perrine
collection PubMed
description BACKGROUND: Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data. METHODS: We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data. RESULTS: Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations. CONCLUSIONS: The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
format Online
Article
Text
id pubmed-6280495
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62804952018-12-10 Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors Soret, Perrine Avalos, Marta Wittkop, Linda Commenges, Daniel Thiébaut, Rodolphe BMC Med Res Methodol Research Article BACKGROUND: Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data. METHODS: We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data. RESULTS: Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations. CONCLUSIONS: The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations. BioMed Central 2018-12-04 /pmc/articles/PMC6280495/ /pubmed/30514234 http://dx.doi.org/10.1186/s12874-018-0609-4 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Soret, Perrine
Avalos, Marta
Wittkop, Linda
Commenges, Daniel
Thiébaut, Rodolphe
Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors
title Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors
title_full Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors
title_fullStr Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors
title_full_unstemmed Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors
title_short Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors
title_sort lasso regularization for left-censored gaussian outcome and high-dimensional predictors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280495/
https://www.ncbi.nlm.nih.gov/pubmed/30514234
http://dx.doi.org/10.1186/s12874-018-0609-4
work_keys_str_mv AT soretperrine lassoregularizationforleftcensoredgaussianoutcomeandhighdimensionalpredictors
AT avalosmarta lassoregularizationforleftcensoredgaussianoutcomeandhighdimensionalpredictors
AT wittkoplinda lassoregularizationforleftcensoredgaussianoutcomeandhighdimensionalpredictors
AT commengesdaniel lassoregularizationforleftcensoredgaussianoutcomeandhighdimensionalpredictors
AT thiebautrodolphe lassoregularizationforleftcensoredgaussianoutcomeandhighdimensionalpredictors