Cargando…

Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares

With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis....

Descripción completa

Detalles Bibliográficos
Autores principales: Yi, Honggang, Wo, Hongmei, Zhao, Yang, Zhang, Ruyang, Dai, Junchen, Jin, Guangfu, Ma, Hongxia, Wu, Tangchun, Hu, Zhibin, Lin, Dongxin, Shen, Hongbing, Chen, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Editorial Department of Journal of Biomedical Research 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547378/
https://www.ncbi.nlm.nih.gov/pubmed/26243516
http://dx.doi.org/10.7555/JBR.29.20140043
_version_ 1782387061932687360
author Yi, Honggang
Wo, Hongmei
Zhao, Yang
Zhang, Ruyang
Dai, Junchen
Jin, Guangfu
Ma, Hongxia
Wu, Tangchun
Hu, Zhibin
Lin, Dongxin
Shen, Hongbing
Chen, Feng
author_facet Yi, Honggang
Wo, Hongmei
Zhao, Yang
Zhang, Ruyang
Dai, Junchen
Jin, Guangfu
Ma, Hongxia
Wu, Tangchun
Hu, Zhibin
Lin, Dongxin
Shen, Hongbing
Chen, Feng
author_sort Yi, Honggang
collection PubMed
description With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
format Online
Article
Text
id pubmed-4547378
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Editorial Department of Journal of Biomedical Research
record_format MEDLINE/PubMed
spelling pubmed-45473782015-09-01 Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares Yi, Honggang Wo, Hongmei Zhao, Yang Zhang, Ruyang Dai, Junchen Jin, Guangfu Ma, Hongxia Wu, Tangchun Hu, Zhibin Lin, Dongxin Shen, Hongbing Chen, Feng J Biomed Res Original Article With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data. Editorial Department of Journal of Biomedical Research 2015-07 2015-04-20 /pmc/articles/PMC4547378/ /pubmed/26243516 http://dx.doi.org/10.7555/JBR.29.20140043 Text en © 2015 the Journal of Biomedical Research. All rights reserved.
spellingShingle Original Article
Yi, Honggang
Wo, Hongmei
Zhao, Yang
Zhang, Ruyang
Dai, Junchen
Jin, Guangfu
Ma, Hongxia
Wu, Tangchun
Hu, Zhibin
Lin, Dongxin
Shen, Hongbing
Chen, Feng
Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
title Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
title_full Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
title_fullStr Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
title_full_unstemmed Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
title_short Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
title_sort comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547378/
https://www.ncbi.nlm.nih.gov/pubmed/26243516
http://dx.doi.org/10.7555/JBR.29.20140043
work_keys_str_mv AT yihonggang comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT wohongmei comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT zhaoyang comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT zhangruyang comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT daijunchen comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT jinguangfu comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT mahongxia comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT wutangchun comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT huzhibin comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT lindongxin comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT shenhongbing comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares
AT chenfeng comparisonofdimensionreductionbasedlogisticregressionmodelsforcasecontrolgenomewideassociationstudyprincipalcomponentsanalysisvspartialleastsquares