Cargando…

Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?

Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Junqueira, Vinícius Silva, Lourenco, Daniela, Masuda, Yutaka, Cardoso, Fernando Flores, Lopes, Paulo Sávio, Silva, Fabyano Fonseca e, Misztal, Ignacy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Animal Genetics and Genomics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9118993/ https://www.ncbi.nlm.nih.gov/pubmed/35289906 http://dx.doi.org/10.1093/jas/skac082

_version_	1784710617963167744
author	Junqueira, Vinícius Silva Lourenco, Daniela Masuda, Yutaka Cardoso, Fernando Flores Lopes, Paulo Sávio Silva, Fabyano Fonseca e Misztal, Ignacy
author_facet	Junqueira, Vinícius Silva Lourenco, Daniela Masuda, Yutaka Cardoso, Fernando Flores Lopes, Paulo Sávio Silva, Fabyano Fonseca e Misztal, Ignacy
author_sort	Junqueira, Vinícius Silva
collection	PubMed
description	Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is the method of choice. The genomic relationship matrix (G) used in both cases is dense, limiting computations depending on the number of genotyped animals. The algorithm for proven and young (APY) can be used to create a sparse inverse of G ([Formula: see text]) with close to linear memory and computing requirements. In ssGREML, the inverse of the realized relationship matrix (H(−1)) also includes the inverse of the pedigree relationship matrix, which can be dense with a long pedigree, but sparser with short. The main purpose of this study was to investigate whether costs of ssGREML can be reduced using APY with truncated pedigree and phenotypes. We also investigated the impact of truncation on variance components estimation when different numbers of core animals are used in APY. Simulations included 150K animals from 10 generations, with selection. Phenotypes (h(2) = 0.3) were available for all animals in generations 1–9. A total of 30K animals in generations 8 and 9, and 15K validation animals in generation 10 were genotyped for 52,890 SNP. Average information REML and ssGREML with G(−1) and [Formula: see text] using 1K, 5K, 9K, and 14K core animals were compared. Variance components are impacted when the core group in APY represents the number of eigenvalues explaining a small fraction of the total variation in G. The most time-consuming operation was the inversion of G, with more than 50% of the total time. Next, numerical factorization consumed nearly 30% of the total computing time. On average, a 7% decrease in the computing time for ordering was observed by removing each generation of data. APY can be successfully applied to create the inverse of the genomic relationship matrix used in ssGREML for estimating variance components. To ensure reliable variance component estimation, it is important to use a core size that corresponds to the number of largest eigenvalues explaining around 98% of total variation in G. When APY is used, pedigrees can be truncated to increase the sparsity of H and slightly reduce computing time for ordering and symbolic factorization, with no impact on the estimates.
format	Online Article Text
id	pubmed-9118993
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-91189932022-05-20 Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present? Junqueira, Vinícius Silva Lourenco, Daniela Masuda, Yutaka Cardoso, Fernando Flores Lopes, Paulo Sávio Silva, Fabyano Fonseca e Misztal, Ignacy J Anim Sci Animal Genetics and Genomics Efficient computing techniques allow the estimation of variance components for virtually any traditional dataset. When genomic information is available, variance components can be estimated using genomic REML (GREML). If only a portion of the animals have genotypes, single-step GREML (ssGREML) is the method of choice. The genomic relationship matrix (G) used in both cases is dense, limiting computations depending on the number of genotyped animals. The algorithm for proven and young (APY) can be used to create a sparse inverse of G ([Formula: see text]) with close to linear memory and computing requirements. In ssGREML, the inverse of the realized relationship matrix (H(−1)) also includes the inverse of the pedigree relationship matrix, which can be dense with a long pedigree, but sparser with short. The main purpose of this study was to investigate whether costs of ssGREML can be reduced using APY with truncated pedigree and phenotypes. We also investigated the impact of truncation on variance components estimation when different numbers of core animals are used in APY. Simulations included 150K animals from 10 generations, with selection. Phenotypes (h(2) = 0.3) were available for all animals in generations 1–9. A total of 30K animals in generations 8 and 9, and 15K validation animals in generation 10 were genotyped for 52,890 SNP. Average information REML and ssGREML with G(−1) and [Formula: see text] using 1K, 5K, 9K, and 14K core animals were compared. Variance components are impacted when the core group in APY represents the number of eigenvalues explaining a small fraction of the total variation in G. The most time-consuming operation was the inversion of G, with more than 50% of the total time. Next, numerical factorization consumed nearly 30% of the total computing time. On average, a 7% decrease in the computing time for ordering was observed by removing each generation of data. APY can be successfully applied to create the inverse of the genomic relationship matrix used in ssGREML for estimating variance components. To ensure reliable variance component estimation, it is important to use a core size that corresponds to the number of largest eigenvalues explaining around 98% of total variation in G. When APY is used, pedigrees can be truncated to increase the sparsity of H and slightly reduce computing time for ordering and symbolic factorization, with no impact on the estimates. Oxford University Press 2022-03-15 /pmc/articles/PMC9118993/ /pubmed/35289906 http://dx.doi.org/10.1093/jas/skac082 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Society of Animal Science. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Animal Genetics and Genomics Junqueira, Vinícius Silva Lourenco, Daniela Masuda, Yutaka Cardoso, Fernando Flores Lopes, Paulo Sávio Silva, Fabyano Fonseca e Misztal, Ignacy Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?
title	Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?
title_full	Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?
title_fullStr	Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?
title_full_unstemmed	Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?
title_short	Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?
title_sort	is single-step genomic reml with the algorithm for proven and young more computationally efficient when less generations of data are present?
topic	Animal Genetics and Genomics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9118993/ https://www.ncbi.nlm.nih.gov/pubmed/35289906 http://dx.doi.org/10.1093/jas/skac082
work_keys_str_mv	AT junqueiraviniciussilva issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent AT lourencodaniela issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent AT masudayutaka issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent AT cardosofernandoflores issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent AT lopespaulosavio issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent AT silvafabyanofonsecae issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent AT misztalignacy issinglestepgenomicremlwiththealgorithmforprovenandyoungmorecomputationallyefficientwhenlessgenerationsofdataarepresent

Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?

Ejemplares similares