Cargando…

Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean

An important and broadly used tool for selection purposes and to increase yield and genetic gain in plant breeding programs is genomic prediction (GP). Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype (eg, yield) of individual...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jarquín, Diego, Howard, Reka, Graef, George, Lorenz, Aaron
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2019
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6407170/ https://www.ncbi.nlm.nih.gov/pubmed/30872917 http://dx.doi.org/10.1177/1176934319831307

_version_	1783401486713094144
author	Jarquín, Diego Howard, Reka Graef, George Lorenz, Aaron
author_facet	Jarquín, Diego Howard, Reka Graef, George Lorenz, Aaron
author_sort	Jarquín, Diego
collection	PubMed
description	An important and broadly used tool for selection purposes and to increase yield and genetic gain in plant breeding programs is genomic prediction (GP). Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype (eg, yield) of individuals for which only marker data are available. Higher prediction accuracy can be achieved not only by using efficient models but also by using quality molecular marker and phenotypic data. The steps of a typical quality control (QC) of marker data include the elimination of markers with certain level of minor allele frequency (MAF) and missing marker values and the imputation of missing marker values. In this article, we evaluated how the prediction accuracy is influenced by the combination of 12 MAF values, 27 different percentages of missing marker values, and 2 imputation techniques (IT; naïve and Random Forest (RF)). We constructed a response surface of prediction accuracy values for the two ITs as a function of MAF and percentage of missing marker values using soybean data from the University of Nebraska–Lincoln Soybean Breeding Program. We found that both the genetic architecture of the trait and the IT affect the prediction accuracy implying that we have to be careful how we perform QC on the marker data. For the corresponding combinations MAF-percentage of missing values we observed that implementing the RF imputation increased the number of markers by 2 to 5 times than the simple naïve imputation method that is based on the mean allele dosage of the non-missing values at each loci. We conclude that there is not a unique strategy (combination of the QCs and imputation method) that outperforms the results of the others for all traits.
format	Online Article Text
id	pubmed-6407170
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-64071702019-03-14 Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean Jarquín, Diego Howard, Reka Graef, George Lorenz, Aaron Evol Bioinform Online Review An important and broadly used tool for selection purposes and to increase yield and genetic gain in plant breeding programs is genomic prediction (GP). Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype (eg, yield) of individuals for which only marker data are available. Higher prediction accuracy can be achieved not only by using efficient models but also by using quality molecular marker and phenotypic data. The steps of a typical quality control (QC) of marker data include the elimination of markers with certain level of minor allele frequency (MAF) and missing marker values and the imputation of missing marker values. In this article, we evaluated how the prediction accuracy is influenced by the combination of 12 MAF values, 27 different percentages of missing marker values, and 2 imputation techniques (IT; naïve and Random Forest (RF)). We constructed a response surface of prediction accuracy values for the two ITs as a function of MAF and percentage of missing marker values using soybean data from the University of Nebraska–Lincoln Soybean Breeding Program. We found that both the genetic architecture of the trait and the IT affect the prediction accuracy implying that we have to be careful how we perform QC on the marker data. For the corresponding combinations MAF-percentage of missing values we observed that implementing the RF imputation increased the number of markers by 2 to 5 times than the simple naïve imputation method that is based on the mean allele dosage of the non-missing values at each loci. We conclude that there is not a unique strategy (combination of the QCs and imputation method) that outperforms the results of the others for all traits. SAGE Publications 2019-03-07 /pmc/articles/PMC6407170/ /pubmed/30872917 http://dx.doi.org/10.1177/1176934319831307 Text en © The Author(s) 2019 http://creativecommons.org/licenses/by/4.0/ This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Review Jarquín, Diego Howard, Reka Graef, George Lorenz, Aaron Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
title	Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
title_full	Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
title_fullStr	Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
title_full_unstemmed	Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
title_short	Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean
title_sort	response surface analysis of genomic prediction accuracy values using quality control covariates in soybean
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6407170/ https://www.ncbi.nlm.nih.gov/pubmed/30872917 http://dx.doi.org/10.1177/1176934319831307
work_keys_str_mv	AT jarquindiego responsesurfaceanalysisofgenomicpredictionaccuracyvaluesusingqualitycontrolcovariatesinsoybean AT howardreka responsesurfaceanalysisofgenomicpredictionaccuracyvaluesusingqualitycontrolcovariatesinsoybean AT graefgeorge responsesurfaceanalysisofgenomicpredictionaccuracyvaluesusingqualitycontrolcovariatesinsoybean AT lorenzaaron responsesurfaceanalysisofgenomicpredictionaccuracyvaluesusingqualitycontrolcovariatesinsoybean

Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean

Ejemplares similares