Cargando…
Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis....
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Editorial Department of Journal of Biomedical Research
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547378/ https://www.ncbi.nlm.nih.gov/pubmed/26243516 http://dx.doi.org/10.7555/JBR.29.20140043 |
_version_ | 1782387061932687360 |
---|---|
author | Yi, Honggang Wo, Hongmei Zhao, Yang Zhang, Ruyang Dai, Junchen Jin, Guangfu Ma, Hongxia Wu, Tangchun Hu, Zhibin Lin, Dongxin Shen, Hongbing Chen, Feng |
author_facet | Yi, Honggang Wo, Hongmei Zhao, Yang Zhang, Ruyang Dai, Junchen Jin, Guangfu Ma, Hongxia Wu, Tangchun Hu, Zhibin Lin, Dongxin Shen, Hongbing Chen, Feng |
author_sort | Yi, Honggang |
collection | PubMed |
description | With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data. |
format | Online Article Text |
id | pubmed-4547378 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Editorial Department of Journal of Biomedical Research |
record_format | MEDLINE/PubMed |
spelling | pubmed-45473782015-09-01 Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares Yi, Honggang Wo, Hongmei Zhao, Yang Zhang, Ruyang Dai, Junchen Jin, Guangfu Ma, Hongxia Wu, Tangchun Hu, Zhibin Lin, Dongxin Shen, Hongbing Chen, Feng J Biomed Res Original Article With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data. Editorial Department of Journal of Biomedical Research 2015-07 2015-04-20 /pmc/articles/PMC4547378/ /pubmed/26243516 http://dx.doi.org/10.7555/JBR.29.20140043 Text en © 2015 the Journal of Biomedical Research. All rights reserved. |
spellingShingle | Original Article Yi, Honggang Wo, Hongmei Zhao, Yang Zhang, Ruyang Dai, Junchen Jin, Guangfu Ma, Hongxia Wu, Tangchun Hu, Zhibin Lin, Dongxin Shen, Hongbing Chen, Feng Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
title | Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
title_full | Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
title_fullStr | Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
title_full_unstemmed | Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
title_short | Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
title_sort | comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547378/ https://www.ncbi.nlm.nih.gov/pubmed/26243516 http://dx.doi.org/10.7555/JBR.29.20140043 |
work_keys_str_mv | AT yihonggang comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT wohongmei comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT zhaoyang comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT zhangruyang comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT daijunchen comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT jinguangfu comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT mahongxia comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT wutangchun comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT huzhibin comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT lindongxin comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT shenhongbing comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares AT chenfeng comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares |