Cargando…

Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions

The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cuevas, Jaime, Montesinos-López, Osval A., Martini, J. W. R., Pérez-Rodríguez, Paulino, Lillemo, Morten, Crossa, Jose
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7594507/ https://www.ncbi.nlm.nih.gov/pubmed/33193659 http://dx.doi.org/10.3389/fgene.2020.567757

_version_	1783601656456282112
author	Cuevas, Jaime Montesinos-López, Osval A. Martini, J. W. R. Pérez-Rodríguez, Paulino Lillemo, Morten Crossa, Jose
author_facet	Cuevas, Jaime Montesinos-López, Osval A. Martini, J. W. R. Pérez-Rodríguez, Paulino Lillemo, Morten Crossa, Jose
author_sort	Cuevas, Jaime
collection	PubMed
description	The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m.
format	Online Article Text
id	pubmed-7594507
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-75945072020-11-13 Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions Cuevas, Jaime Montesinos-López, Osval A. Martini, J. W. R. Pérez-Rodríguez, Paulino Lillemo, Morten Crossa, Jose Front Genet Genetics The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m. Frontiers Media S.A. 2020-10-15 /pmc/articles/PMC7594507/ /pubmed/33193659 http://dx.doi.org/10.3389/fgene.2020.567757 Text en Copyright © 2020 Cuevas, Montesinos-López, Martini, Pérez-Rodríguez, Lillemo and Crossa. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Cuevas, Jaime Montesinos-López, Osval A. Martini, J. W. R. Pérez-Rodríguez, Paulino Lillemo, Morten Crossa, Jose Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
title	Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
title_full	Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
title_fullStr	Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
title_full_unstemmed	Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
title_short	Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions
title_sort	approximate genome-based kernel models for large data sets including main effects and interactions
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7594507/ https://www.ncbi.nlm.nih.gov/pubmed/33193659 http://dx.doi.org/10.3389/fgene.2020.567757
work_keys_str_mv	AT cuevasjaime approximategenomebasedkernelmodelsforlargedatasetsincludingmaineffectsandinteractions AT montesinoslopezosvala approximategenomebasedkernelmodelsforlargedatasetsincludingmaineffectsandinteractions AT martinijwr approximategenomebasedkernelmodelsforlargedatasetsincludingmaineffectsandinteractions AT perezrodriguezpaulino approximategenomebasedkernelmodelsforlargedatasetsincludingmaineffectsandinteractions AT lillemomorten approximategenomebasedkernelmodelsforlargedatasetsincludingmaineffectsandinteractions AT crossajose approximategenomebasedkernelmodelsforlargedatasetsincludingmaineffectsandinteractions

Approximate Genome-Based Kernel Models for Large Data Sets Including Main Effects and Interactions

Ejemplares similares