Cargando…
A multivariable approach for risk markers from pooled molecular data with only partial overlap
BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642584/ https://www.ncbi.nlm.nih.gov/pubmed/31324155 http://dx.doi.org/10.1186/s12881-019-0849-0 |
_version_ | 1783437006323318784 |
---|---|
author | Stelzer, Anne-Sophie Maccioni, Livia Gerhold-Ay, Aslihan Smedby, Karin E. Schumacher, Martin Nieters, Alexandra Binder, Harald |
author_facet | Stelzer, Anne-Sophie Maccioni, Livia Gerhold-Ay, Aslihan Smedby, Karin E. Schumacher, Martin Nieters, Alexandra Binder, Harald |
author_sort | Stelzer, Anne-Sophie |
collection | PubMed |
description | BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers – even after imputation. METHODS: Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection. RESULTS: Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals. CONCLUSIONS: Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request. TRIAL REGISTRATION: All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12881-019-0849-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6642584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-66425842019-07-29 A multivariable approach for risk markers from pooled molecular data with only partial overlap Stelzer, Anne-Sophie Maccioni, Livia Gerhold-Ay, Aslihan Smedby, Karin E. Schumacher, Martin Nieters, Alexandra Binder, Harald BMC Med Genet Technical Advance BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers – even after imputation. METHODS: Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection. RESULTS: Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals. CONCLUSIONS: Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request. TRIAL REGISTRATION: All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12881-019-0849-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-19 /pmc/articles/PMC6642584/ /pubmed/31324155 http://dx.doi.org/10.1186/s12881-019-0849-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Advance Stelzer, Anne-Sophie Maccioni, Livia Gerhold-Ay, Aslihan Smedby, Karin E. Schumacher, Martin Nieters, Alexandra Binder, Harald A multivariable approach for risk markers from pooled molecular data with only partial overlap |
title | A multivariable approach for risk markers from pooled molecular data with only partial overlap |
title_full | A multivariable approach for risk markers from pooled molecular data with only partial overlap |
title_fullStr | A multivariable approach for risk markers from pooled molecular data with only partial overlap |
title_full_unstemmed | A multivariable approach for risk markers from pooled molecular data with only partial overlap |
title_short | A multivariable approach for risk markers from pooled molecular data with only partial overlap |
title_sort | multivariable approach for risk markers from pooled molecular data with only partial overlap |
topic | Technical Advance |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642584/ https://www.ncbi.nlm.nih.gov/pubmed/31324155 http://dx.doi.org/10.1186/s12881-019-0849-0 |
work_keys_str_mv | AT stelzerannesophie amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT maccionilivia amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT gerholdayaslihan amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT smedbykarine amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT schumachermartin amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT nietersalexandra amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT binderharald amultivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT stelzerannesophie multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT maccionilivia multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT gerholdayaslihan multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT smedbykarine multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT schumachermartin multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT nietersalexandra multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap AT binderharald multivariableapproachforriskmarkersfrompooledmoleculardatawithonlypartialoverlap |