Cargando…
A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
BACKGROUND: The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals ar...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123737/ https://www.ncbi.nlm.nih.gov/pubmed/35596130 http://dx.doi.org/10.1186/s12711-022-00726-6 |
_version_ | 1784711614278139904 |
---|---|
author | Abdollahi-Arpanahi, Rostam Lourenco, Daniela Misztal, Ignacy |
author_facet | Abdollahi-Arpanahi, Rostam Lourenco, Daniela Misztal, Ignacy |
author_sort | Abdollahi-Arpanahi, Rostam |
collection | PubMed |
description | BACKGROUND: The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals. METHODS: The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records. RESULTS: When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41−0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV. CONCLUSIONS: For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary. |
format | Online Article Text |
id | pubmed-9123737 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91237372022-05-22 A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP Abdollahi-Arpanahi, Rostam Lourenco, Daniela Misztal, Ignacy Genet Sel Evol Research Article BACKGROUND: The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals. METHODS: The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records. RESULTS: When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41−0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV. CONCLUSIONS: For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary. BioMed Central 2022-05-20 /pmc/articles/PMC9123737/ /pubmed/35596130 http://dx.doi.org/10.1186/s12711-022-00726-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Abdollahi-Arpanahi, Rostam Lourenco, Daniela Misztal, Ignacy A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP |
title | A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP |
title_full | A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP |
title_fullStr | A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP |
title_full_unstemmed | A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP |
title_short | A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP |
title_sort | comprehensive study on size and definition of the core group in the proven and young algorithm for single-step gblup |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123737/ https://www.ncbi.nlm.nih.gov/pubmed/35596130 http://dx.doi.org/10.1186/s12711-022-00726-6 |
work_keys_str_mv | AT abdollahiarpanahirostam acomprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup AT lourencodaniela acomprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup AT misztalignacy acomprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup AT abdollahiarpanahirostam comprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup AT lourencodaniela comprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup AT misztalignacy comprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup |