Cargando…

A multivariable approach for risk markers from pooled molecular data with only partial overlap

BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in...

Descripción completa

Detalles Bibliográficos
Autores principales: Stelzer, Anne-Sophie, Maccioni, Livia, Gerhold-Ay, Aslihan, Smedby, Karin E., Schumacher, Martin, Nieters, Alexandra, Binder, Harald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642584/
https://www.ncbi.nlm.nih.gov/pubmed/31324155
http://dx.doi.org/10.1186/s12881-019-0849-0
_version_ 1783437006323318784
author Stelzer, Anne-Sophie
Maccioni, Livia
Gerhold-Ay, Aslihan
Smedby, Karin E.
Schumacher, Martin
Nieters, Alexandra
Binder, Harald
author_facet Stelzer, Anne-Sophie
Maccioni, Livia
Gerhold-Ay, Aslihan
Smedby, Karin E.
Schumacher, Martin
Nieters, Alexandra
Binder, Harald
author_sort Stelzer, Anne-Sophie
collection PubMed
description BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers – even after imputation. METHODS: Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection. RESULTS: Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals. CONCLUSIONS: Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request. TRIAL REGISTRATION: All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12881-019-0849-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6642584
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66425842019-07-29 A multivariable approach for risk markers from pooled molecular data with only partial overlap Stelzer, Anne-Sophie Maccioni, Livia Gerhold-Ay, Aslihan Smedby, Karin E. Schumacher, Martin Nieters, Alexandra Binder, Harald BMC Med Genet Technical Advance BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers – even after imputation. METHODS: Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection. RESULTS: Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals. CONCLUSIONS: Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request. TRIAL REGISTRATION: All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12881-019-0849-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-19 /pmc/articles/PMC6642584/ /pubmed/31324155 http://dx.doi.org/10.1186/s12881-019-0849-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Stelzer, Anne-Sophie
Maccioni, Livia
Gerhold-Ay, Aslihan
Smedby, Karin E.
Schumacher, Martin
Nieters, Alexandra
Binder, Harald
A multivariable approach for risk markers from pooled molecular data with only partial overlap
title A multivariable approach for risk markers from pooled molecular data with only partial overlap
title_full A multivariable approach for risk markers from pooled molecular data with only partial overlap
title_fullStr A multivariable approach for risk markers from pooled molecular data with only partial overlap
title_full_unstemmed A multivariable approach for risk markers from pooled molecular data with only partial overlap
title_short A multivariable approach for risk markers from pooled molecular data with only partial overlap
title_sort multivariable approach for risk markers from pooled molecular data with only partial overlap
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642584/
https://www.ncbi.nlm.nih.gov/pubmed/31324155
http://dx.doi.org/10.1186/s12881-019-0849-0
work_keys_str_mv AT stelzerannesophie amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT maccionilivia amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT gerholdayaslihan amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT smedbykarine amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT schumachermartin amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT nietersalexandra amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT binderharald amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT stelzerannesophie multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT maccionilivia multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT gerholdayaslihan multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT smedbykarine multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT schumachermartin multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT nietersalexandra multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap
AT binderharald multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap