Cargando…

Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction

Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help...

Descripción completa

Detalles Bibliográficos
Autor principal: Xu, Shizhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345720/
https://www.ncbi.nlm.nih.gov/pubmed/28108552
http://dx.doi.org/10.1534/g3.116.038059
_version_ 1782513771255693312
author Xu, Shizhong
author_facet Xu, Shizhong
author_sort Xu, Shizhong
collection PubMed
description Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K – 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.
format Online
Article
Text
id pubmed-5345720
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-53457202017-03-21 Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction Xu, Shizhong G3 (Bethesda) Genomic Selection Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K – 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations. Genetics Society of America 2017-01-19 /pmc/articles/PMC5345720/ /pubmed/28108552 http://dx.doi.org/10.1534/g3.116.038059 Text en Copyright © 2017 Xu http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomic Selection
Xu, Shizhong
Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_full Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_fullStr Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_full_unstemmed Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_short Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
title_sort predicted residual error sum of squares of mixed models: an application for genomic prediction
topic Genomic Selection
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345720/
https://www.ncbi.nlm.nih.gov/pubmed/28108552
http://dx.doi.org/10.1534/g3.116.038059
work_keys_str_mv AT xushizhong predictedresidualerrorsumofsquaresofmixedmodelsanapplicationforgenomicprediction