Cargando…

Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications

Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for d...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Xiao-Lin, Xu, Jiaqi, Feng, Guofei, Wiggans, George R., Taylor, Jeremy F., He, Jun, Qian, Changsong, Qiu, Jiansheng, Simpson, Barry, Walker, Jeremy, Bauck, Stewart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008792/
https://www.ncbi.nlm.nih.gov/pubmed/27583971
http://dx.doi.org/10.1371/journal.pone.0161719
_version_ 1782451440494575616
author Wu, Xiao-Lin
Xu, Jiaqi
Feng, Guofei
Wiggans, George R.
Taylor, Jeremy F.
He, Jun
Qian, Changsong
Qiu, Jiansheng
Simpson, Barry
Walker, Jeremy
Bauck, Stewart
author_facet Wu, Xiao-Lin
Xu, Jiaqi
Feng, Guofei
Wiggans, George R.
Taylor, Jeremy F.
He, Jun
Qian, Changsong
Qiu, Jiansheng
Simpson, Barry
Walker, Jeremy
Bauck, Stewart
author_sort Wu, Xiao-Lin
collection PubMed
description Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.
format Online
Article
Text
id pubmed-5008792
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50087922016-09-27 Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications Wu, Xiao-Lin Xu, Jiaqi Feng, Guofei Wiggans, George R. Taylor, Jeremy F. He, Jun Qian, Changsong Qiu, Jiansheng Simpson, Barry Walker, Jeremy Bauck, Stewart PLoS One Research Article Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. Public Library of Science 2016-09-01 /pmc/articles/PMC5008792/ /pubmed/27583971 http://dx.doi.org/10.1371/journal.pone.0161719 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Wu, Xiao-Lin
Xu, Jiaqi
Feng, Guofei
Wiggans, George R.
Taylor, Jeremy F.
He, Jun
Qian, Changsong
Qiu, Jiansheng
Simpson, Barry
Walker, Jeremy
Bauck, Stewart
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
title Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
title_full Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
title_fullStr Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
title_full_unstemmed Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
title_short Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
title_sort optimal design of low-density snp arrays for genomic prediction: algorithm and applications
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008792/
https://www.ncbi.nlm.nih.gov/pubmed/27583971
http://dx.doi.org/10.1371/journal.pone.0161719
work_keys_str_mv AT wuxiaolin optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT xujiaqi optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT fengguofei optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT wiggansgeorger optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT taylorjeremyf optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT hejun optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT qianchangsong optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT qiujiansheng optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT simpsonbarry optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT walkerjeremy optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications
AT bauckstewart optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications