Cargando…

SLEMM: million-scale genomic predictions with window-based SNP weighting

MOTIVATION: The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging. RESULTS: We present SLEMM (short for Stochastic-Lanczos-Expedited Mixed Models), a new software tool, to address the computational...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Jian, Maltecca, Christian, VanRaden, Paul M, O'Connell, Jeffrey R, Ma, Li, Jiang, Jicai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10039786/
https://www.ncbi.nlm.nih.gov/pubmed/36897019
http://dx.doi.org/10.1093/bioinformatics/btad127
_version_ 1784912342875635712
author Cheng, Jian
Maltecca, Christian
VanRaden, Paul M
O'Connell, Jeffrey R
Ma, Li
Jiang, Jicai
author_facet Cheng, Jian
Maltecca, Christian
VanRaden, Paul M
O'Connell, Jeffrey R
Ma, Li
Jiang, Jicai
author_sort Cheng, Jian
collection PubMed
description MOTIVATION: The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging. RESULTS: We present SLEMM (short for Stochastic-Lanczos-Expedited Mixed Models), a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA’s empirical BLUP, BayesR, KAML, and LDAK’s BOLT and BayesR models. We also compared the methods using nine dairy traits of ∼300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to 3 million individuals and 1 million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/jiang18/slemm.
format Online
Article
Text
id pubmed-10039786
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100397862023-03-26 SLEMM: million-scale genomic predictions with window-based SNP weighting Cheng, Jian Maltecca, Christian VanRaden, Paul M O'Connell, Jeffrey R Ma, Li Jiang, Jicai Bioinformatics Original Paper MOTIVATION: The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging. RESULTS: We present SLEMM (short for Stochastic-Lanczos-Expedited Mixed Models), a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA’s empirical BLUP, BayesR, KAML, and LDAK’s BOLT and BayesR models. We also compared the methods using nine dairy traits of ∼300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to 3 million individuals and 1 million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/jiang18/slemm. Oxford University Press 2023-03-10 /pmc/articles/PMC10039786/ /pubmed/36897019 http://dx.doi.org/10.1093/bioinformatics/btad127 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Cheng, Jian
Maltecca, Christian
VanRaden, Paul M
O'Connell, Jeffrey R
Ma, Li
Jiang, Jicai
SLEMM: million-scale genomic predictions with window-based SNP weighting
title SLEMM: million-scale genomic predictions with window-based SNP weighting
title_full SLEMM: million-scale genomic predictions with window-based SNP weighting
title_fullStr SLEMM: million-scale genomic predictions with window-based SNP weighting
title_full_unstemmed SLEMM: million-scale genomic predictions with window-based SNP weighting
title_short SLEMM: million-scale genomic predictions with window-based SNP weighting
title_sort slemm: million-scale genomic predictions with window-based snp weighting
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10039786/
https://www.ncbi.nlm.nih.gov/pubmed/36897019
http://dx.doi.org/10.1093/bioinformatics/btad127
work_keys_str_mv AT chengjian slemmmillionscalegenomicpredictionswithwindowbasedsnpweighting
AT malteccachristian slemmmillionscalegenomicpredictionswithwindowbasedsnpweighting
AT vanradenpaulm slemmmillionscalegenomicpredictionswithwindowbasedsnpweighting
AT oconnelljeffreyr slemmmillionscalegenomicpredictionswithwindowbasedsnpweighting
AT mali slemmmillionscalegenomicpredictionswithwindowbasedsnpweighting
AT jiangjicai slemmmillionscalegenomicpredictionswithwindowbasedsnpweighting