Cargando…

A two-step method for variable selection in the analysis of a case-cohort study

BACKGROUND: Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interes...

Descripción completa

Detalles Bibliográficos
Autores principales: Newcombe, P J, Connolly, S, Seaman, S, Richardson, S, Sharp, S J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5913627/
https://www.ncbi.nlm.nih.gov/pubmed/29136145
http://dx.doi.org/10.1093/ije/dyx224
_version_ 1783316575291441152
author Newcombe, P J
Connolly, S
Seaman, S
Richardson, S
Sharp, S J
author_facet Newcombe, P J
Connolly, S
Seaman, S
Richardson, S
Sharp, S J
author_sort Newcombe, P J
collection PubMed
description BACKGROUND: Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies. METHODS: We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression. RESULTS: Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods. CONCLUSIONS: The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method.
format Online
Article
Text
id pubmed-5913627
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59136272018-04-30 A two-step method for variable selection in the analysis of a case-cohort study Newcombe, P J Connolly, S Seaman, S Richardson, S Sharp, S J Int J Epidemiol Methods BACKGROUND: Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies. METHODS: We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression. RESULTS: Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods. CONCLUSIONS: The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method. Oxford University Press 2018-04 2017-11-10 /pmc/articles/PMC5913627/ /pubmed/29136145 http://dx.doi.org/10.1093/ije/dyx224 Text en © The Author 2017. Published by Oxford University Press on behalf of the International Epidemiological Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Newcombe, P J
Connolly, S
Seaman, S
Richardson, S
Sharp, S J
A two-step method for variable selection in the analysis of a case-cohort study
title A two-step method for variable selection in the analysis of a case-cohort study
title_full A two-step method for variable selection in the analysis of a case-cohort study
title_fullStr A two-step method for variable selection in the analysis of a case-cohort study
title_full_unstemmed A two-step method for variable selection in the analysis of a case-cohort study
title_short A two-step method for variable selection in the analysis of a case-cohort study
title_sort two-step method for variable selection in the analysis of a case-cohort study
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5913627/
https://www.ncbi.nlm.nih.gov/pubmed/29136145
http://dx.doi.org/10.1093/ije/dyx224
work_keys_str_mv AT newcombepj atwostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT connollys atwostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT seamans atwostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT richardsons atwostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT sharpsj atwostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT newcombepj twostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT connollys twostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT seamans twostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT richardsons twostepmethodforvariableselectionintheanalysisofacasecohortstudy
AT sharpsj twostepmethodforvariableselectionintheanalysisofacasecohortstudy