Cargando…

Optimal breeding-value prediction using a sparse selection index

Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in...

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez-Cruz, Marco, de los Campos, Gustavo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128408/
https://www.ncbi.nlm.nih.gov/pubmed/33748861
http://dx.doi.org/10.1093/genetics/iyab030
_version_ 1783694108406054912
author Lopez-Cruz, Marco
de los Campos, Gustavo
author_facet Lopez-Cruz, Marco
de los Campos, Gustavo
author_sort Lopez-Cruz, Marco
collection PubMed
description Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP.
format Online
Article
Text
id pubmed-8128408
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81284082021-05-21 Optimal breeding-value prediction using a sparse selection index Lopez-Cruz, Marco de los Campos, Gustavo Genetics Investigation Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP. Oxford University Press 2021-03-20 /pmc/articles/PMC8128408/ /pubmed/33748861 http://dx.doi.org/10.1093/genetics/iyab030 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Investigation
Lopez-Cruz, Marco
de los Campos, Gustavo
Optimal breeding-value prediction using a sparse selection index
title Optimal breeding-value prediction using a sparse selection index
title_full Optimal breeding-value prediction using a sparse selection index
title_fullStr Optimal breeding-value prediction using a sparse selection index
title_full_unstemmed Optimal breeding-value prediction using a sparse selection index
title_short Optimal breeding-value prediction using a sparse selection index
title_sort optimal breeding-value prediction using a sparse selection index
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128408/
https://www.ncbi.nlm.nih.gov/pubmed/33748861
http://dx.doi.org/10.1093/genetics/iyab030
work_keys_str_mv AT lopezcruzmarco optimalbreedingvaluepredictionusingasparseselectionindex
AT deloscamposgustavo optimalbreedingvaluepredictionusingasparseselectionindex