Cargando…

A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies

The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jin, Chen, Min, Wen, Yangjun, Zhang, Yin, Lu, Yunan, Wang, Shengmeng, Chen, Juncong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041068/
https://www.ncbi.nlm.nih.gov/pubmed/33854527
http://dx.doi.org/10.3389/fgene.2021.649196
_version_ 1783677873988567040
author Zhang, Jin
Chen, Min
Wen, Yangjun
Zhang, Yin
Lu, Yunan
Wang, Shengmeng
Chen, Juncong
author_facet Zhang, Jin
Chen, Min
Wen, Yangjun
Zhang, Yin
Lu, Yunan
Wang, Shengmeng
Chen, Juncong
author_sort Zhang, Jin
collection PubMed
description The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today’s big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets.
format Online
Article
Text
id pubmed-8041068
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80410682021-04-13 A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies Zhang, Jin Chen, Min Wen, Yangjun Zhang, Yin Lu, Yunan Wang, Shengmeng Chen, Juncong Front Genet Genetics The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today’s big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets. Frontiers Media S.A. 2021-03-29 /pmc/articles/PMC8041068/ /pubmed/33854527 http://dx.doi.org/10.3389/fgene.2021.649196 Text en Copyright © 2021 Zhang, Chen, Wen, Zhang, Lu, Wang and Chen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhang, Jin
Chen, Min
Wen, Yangjun
Zhang, Yin
Lu, Yunan
Wang, Shengmeng
Chen, Juncong
A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies
title A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies
title_full A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies
title_fullStr A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies
title_full_unstemmed A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies
title_short A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies
title_sort fast multi-locus ridge regression algorithm for high-dimensional genome-wide association studies
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041068/
https://www.ncbi.nlm.nih.gov/pubmed/33854527
http://dx.doi.org/10.3389/fgene.2021.649196
work_keys_str_mv AT zhangjin afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT chenmin afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT wenyangjun afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT zhangyin afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT luyunan afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT wangshengmeng afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT chenjuncong afastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT zhangjin fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT chenmin fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT wenyangjun fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT zhangyin fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT luyunan fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT wangshengmeng fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies
AT chenjuncong fastmultilocusridgeregressionalgorithmforhighdimensionalgenomewideassociationstudies