Cargando…

Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm

This paper considers the problem of estimation and variable selection for large high-dimensional data (high number of predictors p and large sample size N, without excluding the possibility that N < p) resulting from an individually matched case-control study. We develop a simple algorithm for th...

Descripción completa

Detalles Bibliográficos
Autores principales: Avalos, Marta, Pouyes, Hélène, Grandvalet, Yves, Orriols, Ludivine, Lagarde, Emmanuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416185/
https://www.ncbi.nlm.nih.gov/pubmed/25916593
http://dx.doi.org/10.1186/1471-2105-16-S6-S1
_version_ 1782369193089302528
author Avalos, Marta
Pouyes, Hélène
Grandvalet, Yves
Orriols, Ludivine
Lagarde, Emmanuel
author_facet Avalos, Marta
Pouyes, Hélène
Grandvalet, Yves
Orriols, Ludivine
Lagarde, Emmanuel
author_sort Avalos, Marta
collection PubMed
description This paper considers the problem of estimation and variable selection for large high-dimensional data (high number of predictors p and large sample size N, without excluding the possibility that N < p) resulting from an individually matched case-control study. We develop a simple algorithm for the adaptation of the Lasso and related methods to the conditional logistic regression model. Our proposal relies on the simplification of the calculations involved in the likelihood function. Then, the proposed algorithm iteratively solves reweighted Lasso problems using cyclical coordinate descent, computed along a regularization path. This method can handle large problems and deal with sparse features efficiently. We discuss benefits and drawbacks with respect to the existing available implementations. We also illustrate the interest and use of these techniques on a pharmacoepidemiological study of medication use and traffic safety.
format Online
Article
Text
id pubmed-4416185
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44161852015-05-07 Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm Avalos, Marta Pouyes, Hélène Grandvalet, Yves Orriols, Ludivine Lagarde, Emmanuel BMC Bioinformatics Research This paper considers the problem of estimation and variable selection for large high-dimensional data (high number of predictors p and large sample size N, without excluding the possibility that N < p) resulting from an individually matched case-control study. We develop a simple algorithm for the adaptation of the Lasso and related methods to the conditional logistic regression model. Our proposal relies on the simplification of the calculations involved in the likelihood function. Then, the proposed algorithm iteratively solves reweighted Lasso problems using cyclical coordinate descent, computed along a regularization path. This method can handle large problems and deal with sparse features efficiently. We discuss benefits and drawbacks with respect to the existing available implementations. We also illustrate the interest and use of these techniques on a pharmacoepidemiological study of medication use and traffic safety. BioMed Central 2015-04-17 /pmc/articles/PMC4416185/ /pubmed/25916593 http://dx.doi.org/10.1186/1471-2105-16-S6-S1 Text en Copyright © 2015 Avalos et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Avalos, Marta
Pouyes, Hélène
Grandvalet, Yves
Orriols, Ludivine
Lagarde, Emmanuel
Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
title Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
title_full Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
title_fullStr Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
title_full_unstemmed Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
title_short Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
title_sort sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416185/
https://www.ncbi.nlm.nih.gov/pubmed/25916593
http://dx.doi.org/10.1186/1471-2105-16-S6-S1
work_keys_str_mv AT avalosmarta sparseconditionallogisticregressionforanalyzinglargescalematcheddatafromepidemiologicalstudiesasimplealgorithm
AT pouyeshelene sparseconditionallogisticregressionforanalyzinglargescalematcheddatafromepidemiologicalstudiesasimplealgorithm
AT grandvaletyves sparseconditionallogisticregressionforanalyzinglargescalematcheddatafromepidemiologicalstudiesasimplealgorithm
AT orriolsludivine sparseconditionallogisticregressionforanalyzinglargescalematcheddatafromepidemiologicalstudiesasimplealgorithm
AT lagardeemmanuel sparseconditionallogisticregressionforanalyzinglargescalematcheddatafromepidemiologicalstudiesasimplealgorithm