Cargando…

Incorporating Prior Knowledge of Principal Components in Genomic Prediction

Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is common...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hosseini-Vardanjani, Sayed M., Shariati, Mohammad M., Moradi Shahrebabak, Hossein, Tahmoorespur, Mojtaba
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2018
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6082966/ https://www.ncbi.nlm.nih.gov/pubmed/30116258 http://dx.doi.org/10.3389/fgene.2018.00289

_version_	1783345884692480000
author	Hosseini-Vardanjani, Sayed M. Shariati, Mohammad M. Moradi Shahrebabak, Hossein Tahmoorespur, Mojtaba
author_facet	Hosseini-Vardanjani, Sayed M. Shariati, Mohammad M. Moradi Shahrebabak, Hossein Tahmoorespur, Mojtaba
author_sort	Hosseini-Vardanjani, Sayed M.
collection	PubMed
description	Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is commonly used to reduce the dimension of predictor variables by generating orthogonal variables. Usually, the knowledge from PCA is incorporated in genomic prediction, assuming equal variance for the PCs or a variance proportional to the eigenvalues, both treat variances as fixed. Here, three prior distributions including normal, scaled-t and double exponential were assumed for PC effects in a Bayesian framework with a subset of PCs. These developed PCR models (dPCRm) were compared to routine genomic prediction models (RGPM) i.e., ridge and Bayesian ridge regression, BayesA, BayesB, and PC regression with a subset of PCs but PC variances predefined as proportional to the eigenvalues (PCR-Eigen). The performance of methods was compared by simulating a single trait with heritability of 0.25 on a genome consisted of 3,000 SNPs on three chromosomes and QTL numbers of 15, 60, and 105. After 500 generations of random mating as the historical population, a population was isolated and mated for another 15 generations. The generations 8 and 9 of recent population were used as the reference population and the next six generations as validation populations. The accuracy and bias of predictions were evaluated within the reference population, and each of validation populations. The accuracies of dPCRm were similar to RGPM (0.536 to 0.664 vs. 0.542 to 0.671), and higher than the accuracies of PCR-Eigen (0.504 to 0.641) within reference population over different QTL numbers. Decline in accuracies in validation populations were from 0.633 to 0.310, 0.639 to 0.313, and 0.617 to 0.298 using dPCRm, RGPM and PCR-Eigen, respectively. Prediction biases of dPCRm and RGPM were similar and always much less than biases of PCR-Eigen. In conclusion assuming PC variances as random variables via prior specification yielded higher accuracy than PCR-Eigen and same accuracy as RGPM, while fewer predictors were used.
format	Online Article Text
id	pubmed-6082966
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-60829662018-08-16 Incorporating Prior Knowledge of Principal Components in Genomic Prediction Hosseini-Vardanjani, Sayed M. Shariati, Mohammad M. Moradi Shahrebabak, Hossein Tahmoorespur, Mojtaba Front Genet Genetics Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is commonly used to reduce the dimension of predictor variables by generating orthogonal variables. Usually, the knowledge from PCA is incorporated in genomic prediction, assuming equal variance for the PCs or a variance proportional to the eigenvalues, both treat variances as fixed. Here, three prior distributions including normal, scaled-t and double exponential were assumed for PC effects in a Bayesian framework with a subset of PCs. These developed PCR models (dPCRm) were compared to routine genomic prediction models (RGPM) i.e., ridge and Bayesian ridge regression, BayesA, BayesB, and PC regression with a subset of PCs but PC variances predefined as proportional to the eigenvalues (PCR-Eigen). The performance of methods was compared by simulating a single trait with heritability of 0.25 on a genome consisted of 3,000 SNPs on three chromosomes and QTL numbers of 15, 60, and 105. After 500 generations of random mating as the historical population, a population was isolated and mated for another 15 generations. The generations 8 and 9 of recent population were used as the reference population and the next six generations as validation populations. The accuracy and bias of predictions were evaluated within the reference population, and each of validation populations. The accuracies of dPCRm were similar to RGPM (0.536 to 0.664 vs. 0.542 to 0.671), and higher than the accuracies of PCR-Eigen (0.504 to 0.641) within reference population over different QTL numbers. Decline in accuracies in validation populations were from 0.633 to 0.310, 0.639 to 0.313, and 0.617 to 0.298 using dPCRm, RGPM and PCR-Eigen, respectively. Prediction biases of dPCRm and RGPM were similar and always much less than biases of PCR-Eigen. In conclusion assuming PC variances as random variables via prior specification yielded higher accuracy than PCR-Eigen and same accuracy as RGPM, while fewer predictors were used. Frontiers Media S.A. 2018-08-02 /pmc/articles/PMC6082966/ /pubmed/30116258 http://dx.doi.org/10.3389/fgene.2018.00289 Text en Copyright © 2018 Hosseini-Vardanjani, Shariati, Moradi Shahrebabak and Tahmoorespur. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Hosseini-Vardanjani, Sayed M. Shariati, Mohammad M. Moradi Shahrebabak, Hossein Tahmoorespur, Mojtaba Incorporating Prior Knowledge of Principal Components in Genomic Prediction
title	Incorporating Prior Knowledge of Principal Components in Genomic Prediction
title_full	Incorporating Prior Knowledge of Principal Components in Genomic Prediction
title_fullStr	Incorporating Prior Knowledge of Principal Components in Genomic Prediction
title_full_unstemmed	Incorporating Prior Knowledge of Principal Components in Genomic Prediction
title_short	Incorporating Prior Knowledge of Principal Components in Genomic Prediction
title_sort	incorporating prior knowledge of principal components in genomic prediction
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6082966/ https://www.ncbi.nlm.nih.gov/pubmed/30116258 http://dx.doi.org/10.3389/fgene.2018.00289
work_keys_str_mv	AT hosseinivardanjanisayedm incorporatingpriorknowledgeofprincipalcomponentsingenomicprediction AT shariatimohammadm incorporatingpriorknowledgeofprincipalcomponentsingenomicprediction AT moradishahrebabakhossein incorporatingpriorknowledgeofprincipalcomponentsingenomicprediction AT tahmoorespurmojtaba incorporatingpriorknowledgeofprincipalcomponentsingenomicprediction

Incorporating Prior Knowledge of Principal Components in Genomic Prediction

Ejemplares similares