Cargando…

Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling

Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is st...

Descripción completa

Detalles Bibliográficos
Autores principales: Hieke, Stefanie, Benner, Axel, Schlenk, Richard F., Schumacher, Martin, Bullinger, Lars, Binder, Harald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4861340/
https://www.ncbi.nlm.nih.gov/pubmed/27159447
http://dx.doi.org/10.1371/journal.pone.0155226
_version_ 1782431204953292800
author Hieke, Stefanie
Benner, Axel
Schlenk, Richard F.
Schumacher, Martin
Bullinger, Lars
Binder, Harald
author_facet Hieke, Stefanie
Benner, Axel
Schlenk, Richard F.
Schumacher, Martin
Bullinger, Lars
Binder, Harald
author_sort Hieke, Stefanie
collection PubMed
description Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses.
format Online
Article
Text
id pubmed-4861340
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48613402016-05-13 Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling Hieke, Stefanie Benner, Axel Schlenk, Richard F. Schumacher, Martin Bullinger, Lars Binder, Harald PLoS One Research Article Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on stable selection of a small set of SNPs and corresponding genes for subsequent validation. For univariate analysis, a permutation-based approach is proposed to test at the gene level. We use regularized multivariable regression models for considering all SNPs simultaneously and selecting a small set of potentially important prognostic SNPs. Stability is judged according to resampling inclusion frequencies for both the univariate and the multivariable approach. The overall strategy is illustrated with data from a cohort of acute myeloid leukemia patients and explored in a simulation study. The multivariable approach is seen to automatically focus on a smaller set of SNPs compared to the univariate approach, roughly in line with blocks of correlated SNPs. This more targeted extraction of SNPs results in more stable selection at the SNP as well as at the gene level. Thus, the multivariable regression approach with resampling provides a perspective in the proposed analysis strategy for SNP data in clinical cohorts highlighting what can be added by regularized regression techniques compared to univariate analyses. Public Library of Science 2016-05-09 /pmc/articles/PMC4861340/ /pubmed/27159447 http://dx.doi.org/10.1371/journal.pone.0155226 Text en © 2016 Hieke et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hieke, Stefanie
Benner, Axel
Schlenk, Richard F.
Schumacher, Martin
Bullinger, Lars
Binder, Harald
Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
title Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
title_full Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
title_fullStr Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
title_full_unstemmed Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
title_short Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
title_sort identifying prognostic snps in clinical cohorts: complementing univariate analyses by resampling and multivariable modeling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4861340/
https://www.ncbi.nlm.nih.gov/pubmed/27159447
http://dx.doi.org/10.1371/journal.pone.0155226
work_keys_str_mv AT hiekestefanie identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT benneraxel identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT schlenkrichardf identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT schumachermartin identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT bullingerlars identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling
AT binderharald identifyingprognosticsnpsinclinicalcohortscomplementingunivariateanalysesbyresamplingandmultivariablemodeling