Cargando…
Optimal breeding-value prediction using a sparse selection index
Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128408/ https://www.ncbi.nlm.nih.gov/pubmed/33748861 http://dx.doi.org/10.1093/genetics/iyab030 |
_version_ | 1783694108406054912 |
---|---|
author | Lopez-Cruz, Marco de los Campos, Gustavo |
author_facet | Lopez-Cruz, Marco de los Campos, Gustavo |
author_sort | Lopez-Cruz, Marco |
collection | PubMed |
description | Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP. |
format | Online Article Text |
id | pubmed-8128408 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-81284082021-05-21 Optimal breeding-value prediction using a sparse selection index Lopez-Cruz, Marco de los Campos, Gustavo Genetics Investigation Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP. Oxford University Press 2021-03-20 /pmc/articles/PMC8128408/ /pubmed/33748861 http://dx.doi.org/10.1093/genetics/iyab030 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Investigation Lopez-Cruz, Marco de los Campos, Gustavo Optimal breeding-value prediction using a sparse selection index |
title | Optimal breeding-value prediction using a sparse selection index |
title_full | Optimal breeding-value prediction using a sparse selection index |
title_fullStr | Optimal breeding-value prediction using a sparse selection index |
title_full_unstemmed | Optimal breeding-value prediction using a sparse selection index |
title_short | Optimal breeding-value prediction using a sparse selection index |
title_sort | optimal breeding-value prediction using a sparse selection index |
topic | Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128408/ https://www.ncbi.nlm.nih.gov/pubmed/33748861 http://dx.doi.org/10.1093/genetics/iyab030 |
work_keys_str_mv | AT lopezcruzmarco optimalbreedingvaluepredictionusingasparseselectionindex AT deloscamposgustavo optimalbreedingvaluepredictionusingasparseselectionindex |