Cargando…
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for d...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008792/ https://www.ncbi.nlm.nih.gov/pubmed/27583971 http://dx.doi.org/10.1371/journal.pone.0161719 |
_version_ | 1782451440494575616 |
---|---|
author | Wu, Xiao-Lin Xu, Jiaqi Feng, Guofei Wiggans, George R. Taylor, Jeremy F. He, Jun Qian, Changsong Qiu, Jiansheng Simpson, Barry Walker, Jeremy Bauck, Stewart |
author_facet | Wu, Xiao-Lin Xu, Jiaqi Feng, Guofei Wiggans, George R. Taylor, Jeremy F. He, Jun Qian, Changsong Qiu, Jiansheng Simpson, Barry Walker, Jeremy Bauck, Stewart |
author_sort | Wu, Xiao-Lin |
collection | PubMed |
description | Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. |
format | Online Article Text |
id | pubmed-5008792 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-50087922016-09-27 Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications Wu, Xiao-Lin Xu, Jiaqi Feng, Guofei Wiggans, George R. Taylor, Jeremy F. He, Jun Qian, Changsong Qiu, Jiansheng Simpson, Barry Walker, Jeremy Bauck, Stewart PLoS One Research Article Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. Public Library of Science 2016-09-01 /pmc/articles/PMC5008792/ /pubmed/27583971 http://dx.doi.org/10.1371/journal.pone.0161719 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Wu, Xiao-Lin Xu, Jiaqi Feng, Guofei Wiggans, George R. Taylor, Jeremy F. He, Jun Qian, Changsong Qiu, Jiansheng Simpson, Barry Walker, Jeremy Bauck, Stewart Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications |
title | Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications |
title_full | Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications |
title_fullStr | Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications |
title_full_unstemmed | Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications |
title_short | Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications |
title_sort | optimal design of low-density snp arrays for genomic prediction: algorithm and applications |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008792/ https://www.ncbi.nlm.nih.gov/pubmed/27583971 http://dx.doi.org/10.1371/journal.pone.0161719 |
work_keys_str_mv | AT wuxiaolin optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT xujiaqi optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT fengguofei optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT wiggansgeorger optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT taylorjeremyf optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT hejun optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT qianchangsong optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT qiujiansheng optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT simpsonbarry optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT walkerjeremy optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications AT bauckstewart optimaldesignoflowdensitysnparraysforgenomicpredictionalgorithmandapplications |