Cargando…

A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP

BACKGROUND: The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdollahi-Arpanahi, Rostam, Lourenco, Daniela, Misztal, Ignacy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123737/
https://www.ncbi.nlm.nih.gov/pubmed/35596130
http://dx.doi.org/10.1186/s12711-022-00726-6
_version_ 1784711614278139904
author Abdollahi-Arpanahi, Rostam
Lourenco, Daniela
Misztal, Ignacy
author_facet Abdollahi-Arpanahi, Rostam
Lourenco, Daniela
Misztal, Ignacy
author_sort Abdollahi-Arpanahi, Rostam
collection PubMed
description BACKGROUND: The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals. METHODS: The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records. RESULTS: When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41−0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV. CONCLUSIONS: For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary.
format Online
Article
Text
id pubmed-9123737
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91237372022-05-22 A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP Abdollahi-Arpanahi, Rostam Lourenco, Daniela Misztal, Ignacy Genet Sel Evol Research Article BACKGROUND: The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals. METHODS: The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records. RESULTS: When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41−0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV. CONCLUSIONS: For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary. BioMed Central 2022-05-20 /pmc/articles/PMC9123737/ /pubmed/35596130 http://dx.doi.org/10.1186/s12711-022-00726-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Abdollahi-Arpanahi, Rostam
Lourenco, Daniela
Misztal, Ignacy
A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
title A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
title_full A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
title_fullStr A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
title_full_unstemmed A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
title_short A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP
title_sort comprehensive study on size and definition of the core group in the proven and young algorithm for single-step gblup
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123737/
https://www.ncbi.nlm.nih.gov/pubmed/35596130
http://dx.doi.org/10.1186/s12711-022-00726-6
work_keys_str_mv AT abdollahiarpanahirostam acomprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup
AT lourencodaniela acomprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup
AT misztalignacy acomprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup
AT abdollahiarpanahirostam comprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup
AT lourencodaniela comprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup
AT misztalignacy comprehensivestudyonsizeanddefinitionofthecoregroupintheprovenandyoungalgorithmforsinglestepgblup